.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_model_card.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_model_card.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_model_card.py:


scikit-learn model cards
------------------------

This guide demonstrates how you can use this package to create a model card on a
scikit-learn compatible model and save it.

.. GENERATED FROM PYTHON SOURCE LINES 10-13

Imports
=======
First we will import everything required for the rest of this document.

.. GENERATED FROM PYTHON SOURCE LINES 13-36

.. code-block:: Python


    import pickle
    from pathlib import Path
    from tempfile import mkdtemp, mkstemp

    import pandas as pd
    import sklearn
    from sklearn.datasets import load_breast_cancer
    from sklearn.ensemble import HistGradientBoostingClassifier
    from sklearn.experimental import enable_halving_search_cv  # noqa
    from sklearn.inspection import permutation_importance
    from sklearn.metrics import (
        ConfusionMatrixDisplay,
        accuracy_score,
        classification_report,
        confusion_matrix,
        f1_score,
    )
    from sklearn.model_selection import HalvingGridSearchCV, train_test_split

    from skops import hub_utils
    from skops.card import Card, metadata_from_config


.. GENERATED FROM PYTHON SOURCE LINES 37-40

Data
====
We load breast cancer dataset from sklearn.

.. GENERATED FROM PYTHON SOURCE LINES 40-48

.. code-block:: Python


    X, y = load_breast_cancer(as_frame=True, return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )
    print("X's summary: ", X.describe())
    print("y's summary: ", y.describe())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    X's summary:         mean radius  mean texture  ...  worst symmetry  worst fractal dimension
    count   569.000000    569.000000  ...      569.000000               569.000000
    mean     14.127292     19.289649  ...        0.290076                 0.083946
    std       3.524049      4.301036  ...        0.061867                 0.018061
    min       6.981000      9.710000  ...        0.156500                 0.055040
    25%      11.700000     16.170000  ...        0.250400                 0.071460
    50%      13.370000     18.840000  ...        0.282200                 0.080040
    75%      15.780000     21.800000  ...        0.317900                 0.092080
    max      28.110000     39.280000  ...        0.663800                 0.207500

    [8 rows x 30 columns]
    y's summary:  count    569.000000
    mean       0.627417
    std        0.483918
    min        0.000000
    25%        0.000000
    50%        1.000000
    75%        1.000000
    max        1.000000
    Name: target, dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 49-54

Train a Model
=============
Using the above data, we train a model. To select the model, we use
:class:`~sklearn.model_selection.HalvingGridSearchCV` with a parameter grid
over :class:`~sklearn.ensemble.HistGradientBoostingClassifier`.

.. GENERATED FROM PYTHON SOURCE LINES 54-69

.. code-block:: Python


    param_grid = {
        "max_leaf_nodes": [5, 10, 15],
        "max_depth": [2, 5, 10],
    }

    model = HalvingGridSearchCV(
        estimator=HistGradientBoostingClassifier(),
        param_grid=param_grid,
        random_state=42,
        n_jobs=-1,
    ).fit(X_train, y_train)
    model.score(X_test, y_test)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0.9590643274853801


.. GENERATED FROM PYTHON SOURCE LINES 70-73

Initialize a repository to save our files in
============================================
We will now initialize a repository and save our model

.. GENERATED FROM PYTHON SOURCE LINES 73-88

.. code-block:: Python

    _, pkl_name = mkstemp(prefix="skops-", suffix=".pkl")

    with open(pkl_name, mode="bw") as f:
        pickle.dump(model, file=f)

    local_repo = mkdtemp(prefix="skops-")

    hub_utils.init(
        model=pkl_name,
        requirements=[f"scikit-learn={sklearn.__version__}"],
        dst=local_repo,
        task="tabular-classification",
        data=X_test,
    )


.. GENERATED FROM PYTHON SOURCE LINES 89-95

Create a model card
====================
We now create a model card, and populate its metadata with information which
is already provided in ``config.json``, which itself is created by the call to
:func:`.hub_utils.init` above. We will see below how we can populate the model
card with useful information.

.. GENERATED FROM PYTHON SOURCE LINES 95-98

.. code-block:: Python


    model_card = Card(model, metadata=metadata_from_config(Path(local_repo)))


.. GENERATED FROM PYTHON SOURCE LINES 99-104

Add more information
====================
So far, the model card does not tell viewers a lot about the model. Therefore,
we add more information about the model, like a description and what its
license is.

.. GENERATED FROM PYTHON SOURCE LINES 104-123

.. code-block:: Python


    model_card.metadata.license = "mit"
    limitations = "This model is not ready to be used in production."
    model_description = (
        "This is a `HistGradientBoostingClassifier` model trained on breast cancer "
        "dataset. It's trained with `HalvingGridSearchCV`, with parameter grids on "
        "`max_leaf_nodes` and `max_depth`."
    )
    model_card_authors = "skops_user"
    citation_bibtex = "**BibTeX**\n\n```\n@inproceedings{...,year={2020}}\n```"
    model_card.add(
        **{  # type: ignore
            "Citation": citation_bibtex,
            "Model Card Authors": model_card_authors,
            "Model description": model_description,
            "Model description/Intended uses & limitations": limitations,
        }
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Card(
      model=HalvingGridSearchCV(estimator=Hist...es': [5, 10, 15]}, random_state=42),
      metadata.library_name=sklearn,
      metadata.license=mit,
      metadata.tags=['sklearn', 'skops', 'tabular-classification'],
      metadata.model_format=pickle,
      metadata.model_file=skops-6liqqqbb.pkl,
      metadata.widget=[{...}],
      Model description=This is a `HistGradientBoost..._leaf_nodes` and `max_depth`.,
      Model description/Intended uses & limitations=This model is ... in production.,
      Model description/Training Procedure/Hyperparameters=TableSection(36x2),
      Model description/Training Procedure/Model Plot=<style>#sk-co...v></div></div>,
      Model Card Authors=skops_user,
      Citation=**BibTeX** ``` @inproceedings{...,year={2020}} ```,
    )


.. GENERATED FROM PYTHON SOURCE LINES 124-136

Add plots, metrics, and tables to our model card
================================================
Furthermore, to better understand the model performance, we should evaluate it
on certain metrics and add those evaluations to the model card. In this
particular example, we want to calculate the accuracy and the F1 score. We
calculate those using sklearn and then add them to the model card by calling
:meth:`.Card.add_metrics`. But this is not all, we can also add matplotlib
figures to the model card, e.g. a plot of the confusion matrix. To achieve
this, we create the plot using sklearn, save it locally, and then add it using
:meth:`.Card.add_plot` method. Finally, we can also add some useful tables to
the model card, e.g. the results from the grid search and the classification
report. Those can be added using :meth:`.Card.add_table`

.. GENERATED FROM PYTHON SOURCE LINES 136-184

.. code-block:: Python


    y_pred = model.predict(X_test)
    eval_descr = (
        "The model is evaluated on test data using accuracy and F1-score with "
        "macro average."
    )
    model_card.add(**{"Model description/Evaluation Results": eval_descr})  # type: ignore

    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average="micro")
    model_card.add_metrics(**{"accuracy": accuracy, "f1 score": f1})

    cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=model.classes_)
    disp.plot()

    disp.figure_.savefig(Path(local_repo) / "confusion_matrix.png")
    model_card.add_plot(
        **{"Model description/Evaluation Results/Confusion Matrix": "confusion_matrix.png"}
    )

    importances = permutation_importance(model, X_test, y_test, n_repeats=10)
    model_card.add_permutation_importances(
        importances,
        X_test.columns,
        plot_file="importance.png",
        plot_name="Permutation Importance",
    )

    cv_results = model.cv_results_
    clf_report = classification_report(
        y_test, y_pred, output_dict=True, target_names=["malignant", "benign"]
    )
    # The classification report has to be transformed into a DataFrame first to have
    # the correct format. This requires removing the "accuracy", which was added
    # above anyway.
    del clf_report["accuracy"]
    clf_report = pd.DataFrame(clf_report).T.reset_index()
    model_card.add_table(
        folded=True,
        **{
            "Model description/Evaluation Results/Hyperparameter search results": (
                cv_results
            ),
            "Model description/Evaluation Results/Classification report": clf_report,
        },
    )


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/images/sphx_glr_plot_model_card_001.png
         :alt: plot model card
         :srcset: /auto_examples/images/sphx_glr_plot_model_card_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/images/sphx_glr_plot_model_card_002.png
         :alt: Permutation Importance
         :srcset: /auto_examples/images/sphx_glr_plot_model_card_002.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Card(
      model=HalvingGridSearchCV(estimator=Hist...es': [5, 10, 15]}, random_state=42),
      metadata.library_name=sklearn,
      metadata.license=mit,
      metadata.tags=['sklearn', 'skops', 'tabular-classification'],
      metadata.model_format=pickle,
      metadata.model_file=skops-6liqqqbb.pkl,
      metadata.widget=[{...}],
      Model description=This is a `HistGradientBoost..._leaf_nodes` and `max_depth`.,
      Model description/Intended uses & limitations=This model is ... in production.,
      Model description/Training Procedure/Hyperparameters=TableSection(36x2),
      Model description/Training Procedure/Model Plot=<style>#sk-co...v></div></div>,
      Model description/Evaluation Results=TableSection(2x2),
      Model description/Evaluation Results/Confusion Matrix=PlotSectio...matrix.png),
      Model description/Evaluation Results/Model description/Evaluation Results/Hyperparameter search results=...,
      Model description/Evaluation Results/Model description/Evaluation Results/Classification report=...,
      Model Card Authors=skops_user,
      Citation=**BibTeX** ``` @inproceedings{...,year={2020}} ```,
      Permutation Importance=PlotSection(importance.png),
    )


.. GENERATED FROM PYTHON SOURCE LINES 185-188

Save model card
===============
We can simply save our model card by providing a path to :meth:`.Card.save`.

.. GENERATED FROM PYTHON SOURCE LINES 188-190

.. code-block:: Python


    model_card.save(Path(local_repo) / "README.md")


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 6.248 seconds)


.. _sphx_glr_download_auto_examples_plot_model_card.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_model_card.ipynb <plot_model_card.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_model_card.py <plot_model_card.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_model_card.zip <plot_model_card.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_