.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_model_card.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_model_card.py: scikit-learn model cards ------------------------ This guide demonstrates how you can use this package to create a model card on a scikit-learn compatible model and save it. .. GENERATED FROM PYTHON SOURCE LINES 10-13 Imports ======= First we will import everything required for the rest of this document. .. GENERATED FROM PYTHON SOURCE LINES 13-36 .. code-block:: Python import pickle from pathlib import Path from tempfile import mkdtemp, mkstemp import pandas as pd import sklearn from sklearn.datasets import load_breast_cancer from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.experimental import enable_halving_search_cv # noqa from sklearn.inspection import permutation_importance from sklearn.metrics import ( ConfusionMatrixDisplay, accuracy_score, classification_report, confusion_matrix, f1_score, ) from sklearn.model_selection import HalvingGridSearchCV, train_test_split from skops import hub_utils from skops.card import Card, metadata_from_config .. GENERATED FROM PYTHON SOURCE LINES 37-40 Data ==== We load breast cancer dataset from sklearn. .. GENERATED FROM PYTHON SOURCE LINES 40-48 .. code-block:: Python X, y = load_breast_cancer(as_frame=True, return_X_y=True) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) print("X's summary: ", X.describe()) print("y's summary: ", y.describe()) .. rst-class:: sphx-glr-script-out .. code-block:: none X's summary: mean radius mean texture ... worst symmetry worst fractal dimension count 569.000000 569.000000 ... 569.000000 569.000000 mean 14.127292 19.289649 ... 0.290076 0.083946 std 3.524049 4.301036 ... 0.061867 0.018061 min 6.981000 9.710000 ... 0.156500 0.055040 25% 11.700000 16.170000 ... 0.250400 0.071460 50% 13.370000 18.840000 ... 0.282200 0.080040 75% 15.780000 21.800000 ... 0.317900 0.092080 max 28.110000 39.280000 ... 0.663800 0.207500 [8 rows x 30 columns] y's summary: count 569.000000 mean 0.627417 std 0.483918 min 0.000000 25% 0.000000 50% 1.000000 75% 1.000000 max 1.000000 Name: target, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 49-54 Train a Model ============= Using the above data, we train a model. To select the model, we use :class:`~sklearn.model_selection.HalvingGridSearchCV` with a parameter grid over :class:`~sklearn.ensemble.HistGradientBoostingClassifier`. .. GENERATED FROM PYTHON SOURCE LINES 54-69 .. code-block:: Python param_grid = { "max_leaf_nodes": [5, 10, 15], "max_depth": [2, 5, 10], } model = HalvingGridSearchCV( estimator=HistGradientBoostingClassifier(), param_grid=param_grid, random_state=42, n_jobs=-1, ).fit(X_train, y_train) model.score(X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9590643274853801 .. GENERATED FROM PYTHON SOURCE LINES 70-73 Initialize a repository to save our files in ============================================ We will now initialize a repository and save our model .. GENERATED FROM PYTHON SOURCE LINES 73-88 .. code-block:: Python _, pkl_name = mkstemp(prefix="skops-", suffix=".pkl") with open(pkl_name, mode="bw") as f: pickle.dump(model, file=f) local_repo = mkdtemp(prefix="skops-") hub_utils.init( model=pkl_name, requirements=[f"scikit-learn={sklearn.__version__}"], dst=local_repo, task="tabular-classification", data=X_test, ) .. GENERATED FROM PYTHON SOURCE LINES 89-95 Create a model card ==================== We now create a model card, and populate its metadata with information which is already provided in ``config.json``, which itself is created by the call to :func:`.hub_utils.init` above. We will see below how we can populate the model card with useful information. .. GENERATED FROM PYTHON SOURCE LINES 95-98 .. code-block:: Python model_card = Card(model, metadata=metadata_from_config(Path(local_repo))) .. GENERATED FROM PYTHON SOURCE LINES 99-104 Add more information ==================== So far, the model card does not tell viewers a lot about the model. Therefore, we add more information about the model, like a description and what its license is. .. GENERATED FROM PYTHON SOURCE LINES 104-123 .. code-block:: Python model_card.metadata.license = "mit" limitations = "This model is not ready to be used in production." model_description = ( "This is a `HistGradientBoostingClassifier` model trained on breast cancer " "dataset. It's trained with `HalvingGridSearchCV`, with parameter grids on " "`max_leaf_nodes` and `max_depth`." ) model_card_authors = "skops_user" citation_bibtex = "**BibTeX**\n\n```\n@inproceedings{...,year={2020}}\n```" model_card.add( **{ # type: ignore "Citation": citation_bibtex, "Model Card Authors": model_card_authors, "Model description": model_description, "Model description/Intended uses & limitations": limitations, } ) .. rst-class:: sphx-glr-script-out .. code-block:: none Card( model=HalvingGridSearchCV(estimator=Hist...es': [5, 10, 15]}, random_state=42), metadata.library_name=sklearn, metadata.license=mit, metadata.tags=['sklearn', 'skops', 'tabular-classification'], metadata.model_format=pickle, metadata.model_file=skops-6liqqqbb.pkl, metadata.widget=[{...}], Model description=This is a `HistGradientBoost..._leaf_nodes` and `max_depth`., Model description/Intended uses & limitations=This model is ... in production., Model description/Training Procedure/Hyperparameters=TableSection(36x2), Model description/Training Procedure/Model Plot=