.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_tabular_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_tabular_regression.py: Tabular Regression with scikit-learn ------------------------------------- This example shows how you can create a Hugging Face Hub compatible repo for a tabular regression task using scikit-learn. We also show how you can generate a model card for the model and the task at hand. .. GENERATED FROM PYTHON SOURCE LINES 11-14 Imports ======= First we will import everything required for the rest of this document. .. GENERATED FROM PYTHON SOURCE LINES 14-30 .. code-block:: Python from pathlib import Path from tempfile import mkdtemp, mkstemp import matplotlib.pyplot as plt import sklearn from sklearn.datasets import load_diabetes from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import skops.io as sio from skops import card, hub_utils .. GENERATED FROM PYTHON SOURCE LINES 31-34 Data ==== We will use diabetes dataset from sklearn. .. GENERATED FROM PYTHON SOURCE LINES 34-40 .. code-block:: Python X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) .. GENERATED FROM PYTHON SOURCE LINES 41-46 Train a Model ============= To train a model, we need to convert our data first to vectors. We will use StandardScalar in our pipeline. We will fit a Linear Regression model with the outputs of the scalar. .. GENERATED FROM PYTHON SOURCE LINES 46-55 .. code-block:: Python model = Pipeline( [ ("scaler", StandardScaler()), ("linear_regression", LinearRegression()), ] ) model.fit(X_train, y_train) .. raw:: html
Pipeline(steps=[('scaler', StandardScaler()),
                    ('linear_regression', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 56-59 Inference ========= Let's see if the model works. .. GENERATED FROM PYTHON SOURCE LINES 59-62 .. code-block:: Python y_pred = model.predict(X_test[:5]) print(y_pred) .. rst-class:: sphx-glr-script-out .. code-block:: none [139.5475584 179.51720835 134.03875572 291.41702925 123.78965872] .. GENERATED FROM PYTHON SOURCE LINES 63-66 Initialize a repository to save our files in ============================================ We will now initialize a repository and save our model .. GENERATED FROM PYTHON SOURCE LINES 66-85 .. code-block:: Python _, pkl_name = mkstemp(prefix="skops-", suffix=".pkl") with open(pkl_name, mode="bw") as f: sio.dump(model, file=f) local_repo = mkdtemp(prefix="skops-") hub_utils.init( model=pkl_name, requirements=[f"scikit-learn={sklearn.__version__}"], dst=local_repo, task="tabular-regression", data=X_test, ) if "__file__" in locals(): # __file__ not defined during docs built # Add this script itself to the files to be uploaded for reproducibility hub_utils.add_files(__file__, dst=local_repo) .. GENERATED FROM PYTHON SOURCE LINES 86-92 Create a model card =================== We now create a model card, and populate its metadata with information which is already provided in ``config.json``, which itself is created by the call to :func:`.hub_utils.init` above. We will see below how we can populate the model card with useful information. .. GENERATED FROM PYTHON SOURCE LINES 92-95 .. code-block:: Python model_card = card.Card(model, metadata=card.metadata_from_config(Path(local_repo))) .. GENERATED FROM PYTHON SOURCE LINES 96-101 Add more information ==================== So far, the model card does not tell viewers a lot about the model. Therefore, we add more information about the model, like a description and what its license is. .. GENERATED FROM PYTHON SOURCE LINES 101-125 .. code-block:: Python model_card.metadata.license = "mit" limitations = ( "This model is made for educational purposes and is not ready to be used in" " production." ) model_description = ( "This is a Linear Regression model trained on diabetes dataset. This model could be" " used to predict the progression of diabetes. This model is pretty limited and" " should just be used as an example of how to user `skops` and Hugging Face Hub." ) model_card_authors = "skops_user, lazarust" citation_bibtex = "bibtex\n@inproceedings{...,year={2022}}" model_card.add( folded=False, **{ "Model Card Authors": model_card_authors, "Intended uses & limitations": limitations, "Citation": citation_bibtex, "Model description": model_description, "Model description/Intended uses & limitations": limitations, }, ) .. rst-class:: sphx-glr-script-out .. code-block:: none Card( model=Pipeline(steps=[('scaler', Standar..._regression', LinearRegression())]), metadata.library_name=sklearn, metadata.license=mit, metadata.tags=['sklearn', 'skops', 'tabular-regression'], metadata.model_format=pickle, metadata.model_file=skops-f_cwjjqe.pkl, metadata.widget=[{...}], Model description=This is a Linear Regression ...`skops` and Hugging Face Hub., Model description/Intended uses & limitations=This model is ... in production., Model description/Training Procedure/Hyperparameters=TableSection(13x2), Model description/Training Procedure/Model Plot=