Note
Go to the end to download the full example code
scikit-learn models on Hugging Face Hub
This guide demonstrates how you can use this package to create a Hugging Face Hub model repository based on a scikit-learn compatible model, and how to fetch scikit-learn compatible models from the Hub and run them locally.
Imports
First we will import everything required for the rest of this document.
import json
import os
import pickle
from pathlib import Path
from tempfile import mkdtemp, mkstemp
from uuid import uuid4
import sklearn
from huggingface_hub import HfApi
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.experimental import enable_halving_search_cv # noqa
from sklearn.model_selection import HalvingGridSearchCV, train_test_split
from skops import card, hub_utils
Data
Then we create some random data to train and evaluate our model.
X, y = load_breast_cancer(as_frame=True, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
print("X's summary: ", X.describe())
print("y's summary: ", y.describe())
X's summary: mean radius mean texture ... worst symmetry worst fractal dimension
count 569.000000 569.000000 ... 569.000000 569.000000
mean 14.127292 19.289649 ... 0.290076 0.083946
std 3.524049 4.301036 ... 0.061867 0.018061
min 6.981000 9.710000 ... 0.156500 0.055040
25% 11.700000 16.170000 ... 0.250400 0.071460
50% 13.370000 18.840000 ... 0.282200 0.080040
75% 15.780000 21.800000 ... 0.317900 0.092080
max 28.110000 39.280000 ... 0.663800 0.207500
[8 rows x 30 columns]
y's summary: count 569.000000
mean 0.627417
std 0.483918
min 0.000000
25% 0.000000
50% 1.000000
75% 1.000000
max 1.000000
Name: target, dtype: float64
Train a Model
Using the above data, we train a model. To select the model, we use
HalvingGridSearchCV
with a parameter grid
over HistGradientBoostingClassifier
.
param_grid = {
"max_leaf_nodes": [5, 10, 15],
"max_depth": [2, 5, 10],
}
model = HalvingGridSearchCV(
estimator=HistGradientBoostingClassifier(),
param_grid=param_grid,
random_state=42,
n_jobs=-1,
).fit(X_train, y_train)
model.score(X_test, y_test)
0.9590643274853801
Initialize a Model Repo
We now initialize a model repository locally, and push it to the hub. For that, we need to first store the model as a pickle file and pass it to the hub tools.
# The file name is not significant, here we choose to save it with a `pkl`
# extension.
_, pkl_name = mkstemp(prefix="skops-", suffix=".pkl")
with open(pkl_name, mode="bw") as f:
pickle.dump(model, file=f)
local_repo = mkdtemp(prefix="skops-")
hub_utils.init(
model=pkl_name,
requirements=[f"scikit-learn={sklearn.__version__}"],
dst=local_repo,
task="tabular-classification",
data=X_test,
)
if "__file__" in locals(): # __file__ not defined during docs built
# Add this script itself to the files to be uploaded for reproducibility
hub_utils.add_files(__file__, dst=local_repo)
We can no see what the contents of the created local repo are:
print(os.listdir(local_repo))
['config.json', 'skops-1ynjmmau.pkl']
Model Card
We will now create a model card and save it. For more information about how
to create a good model card, refer to the model card example. The following code uses
metadata_from_config()
which creates a minimal metadata
object to be included in the metadata section of the model card. The
configuration used by this method is stored in the config.json
file which
is created by the call to init()
.
model_card = card.Card(model, metadata=card.metadata_from_config(Path(local_repo)))
model_card.save(Path(local_repo) / "README.md")
Push to Hub
And finally, we can push the model to the hub. This requires a user access token which you can get under https://huggingface.co/settings/tokens
# you can put your own token here, or set it as an environment variable before
# running this script.
token = os.environ["HF_HUB_TOKEN"]
repo_name = f"hf_hub_example-{uuid4()}"
user_name = HfApi().whoami(token=token)["name"]
repo_id = f"{user_name}/{repo_name}"
print(f"Creating and pushing to repo: {repo_id}")
Creating and pushing to repo: skops-ci/hf_hub_example-72984c1f-22a7-46f1-bc36-2ce4bb5b9032
Now we can push our files to the repo. The following function creates the
remote repository if it doesn’t exist; this is controlled via the
create_remote
argument. Note that here we’re setting private=True
,
which means only people with the right permissions would see the model. Set
private=False
to make it visible to the public.
hub_utils.push(
repo_id=repo_id,
source=local_repo,
token=token,
commit_message="pushing files to the repo from the example!",
create_remote=True,
private=True,
)
skops-1ynjmmau.pkl: 0%| | 0.00/233k [00:00<?, ?B/s]
skops-1ynjmmau.pkl: 100%|##########| 233k/233k [00:00<00:00, 786kB/s]
Once uploaded, other users can download and use it, unless you make the repo private. Given a repository’s name, here’s how one can download it:
Fetching 4 files: 0%| | 0/4 [00:00<?, ?it/s]
Downloading (…)ef448/.gitattributes: 0%| | 0.00/1.52k [00:00<?, ?B/s]
Downloading (…)ef448/.gitattributes: 100%|##########| 1.52k/1.52k [00:00<00:00, 8.81MB/s]
Fetching 4 files: 25%|##5 | 1/4 [00:00<00:00, 9.37it/s]
Downloading (…)178a9ef448/README.md: 0%| | 0.00/12.2k [00:00<?, ?B/s]
Downloading (…)178a9ef448/README.md: 100%|##########| 12.2k/12.2k [00:00<00:00, 60.8MB/s]
Downloading (…)8a9ef448/config.json: 0%| | 0.00/4.85k [00:00<?, ?B/s]
Downloading (…)8a9ef448/config.json: 100%|##########| 4.85k/4.85k [00:00<00:00, 23.6MB/s]
Downloading skops-1ynjmmau.pkl: 0%| | 0.00/233k [00:00<?, ?B/s]
Downloading skops-1ynjmmau.pkl: 100%|##########| 233k/233k [00:00<00:00, 7.08MB/s]
Fetching 4 files: 100%|##########| 4/4 [00:00<00:00, 18.98it/s]
Fetching 4 files: 100%|##########| 4/4 [00:00<00:00, 17.61it/s]
['README.md', 'config.json', '.gitattributes', 'skops-1ynjmmau.pkl']
You can also get the requirements of this repository:
print(hub_utils.get_requirements(path=repo_copy))
['scikit-learn=1.3.0']
As well as the complete configuration of the project:
print(json.dumps(hub_utils.get_config(path=repo_copy), indent=2))
{
"sklearn": {
"columns": [
"mean radius",
"mean texture",
"mean perimeter",
"mean area",
"mean smoothness",
"mean compactness",
"mean concavity",
"mean concave points",
"mean symmetry",
"mean fractal dimension",
"radius error",
"texture error",
"perimeter error",
"area error",
"smoothness error",
"compactness error",
"concavity error",
"concave points error",
"symmetry error",
"fractal dimension error",
"worst radius",
"worst texture",
"worst perimeter",
"worst area",
"worst smoothness",
"worst compactness",
"worst concavity",
"worst concave points",
"worst symmetry",
"worst fractal dimension"
],
"environment": [
"scikit-learn=1.3.0"
],
"example_input": {
"area error": [
30.29,
96.05,
48.31
],
"compactness error": [
0.01911,
0.01652,
0.01484
],
"concave points error": [
0.01037,
0.0137,
0.01093
],
"concavity error": [
0.02701,
0.02269,
0.02813
],
"fractal dimension error": [
0.003586,
0.001698,
0.002461
],
"mean area": [
481.9,
1130.0,
748.9
],
"mean compactness": [
0.1058,
0.1029,
0.1223
],
"mean concave points": [
0.03821,
0.07951,
0.08087
],
"mean concavity": [
0.08005,
0.108,
0.1466
],
"mean fractal dimension": [
0.06373,
0.05461,
0.05796
],
"mean perimeter": [
81.09,
123.6,
101.7
],
"mean radius": [
12.47,
18.94,
15.46
],
"mean smoothness": [
0.09965,
0.09009,
0.1092
],
"mean symmetry": [
0.1925,
0.1582,
0.1931
],
"mean texture": [
18.6,
21.31,
19.48
],
"perimeter error": [
2.497,
5.486,
3.094
],
"radius error": [
0.3961,
0.7888,
0.4743
],
"smoothness error": [
0.006953,
0.004444,
0.00624
],
"symmetry error": [
0.01782,
0.01386,
0.01397
],
"texture error": [
1.044,
0.7975,
0.7859
],
"worst area": [
677.9,
1866.0,
1156.0
],
"worst compactness": [
0.2378,
0.2336,
0.2394
],
"worst concave points": [
0.1015,
0.1789,
0.1514
],
"worst concavity": [
0.2671,
0.2687,
0.3791
],
"worst fractal dimension": [
0.0875,
0.06589,
0.08019
],
"worst perimeter": [
96.05,
165.9,
124.9
],
"worst radius": [
14.97,
24.86,
19.26
],
"worst smoothness": [
0.1426,
0.1193,
0.1546
],
"worst symmetry": [
0.3014,
0.2551,
0.2837
],
"worst texture": [
24.64,
26.58,
26.0
]
},
"model": {
"file": "skops-1ynjmmau.pkl"
},
"model_format": "pickle",
"task": "tabular-classification",
"use_intelex": false
}
}
Now you can check the contents of the repository under your user.
Update Requirements
If you update your environment and the versions of your requirements are
changed, you can update the requirement in your repo by calling
update_env
, which automatically detects the existing installation of the
current environment and updates the requirements accordingly.
hub_utils.update_env(path=local_repo, requirements=["scikit-learn"])
Delete Repository
At the end, you can also delete the repository you created using
HfApi().delete_repo
. For more information please refer to the
documentation of huggingface_hub
library.
Total running time of the script: ( 0 minutes 7.628 seconds)