How to save a trained model from a Vertex AI custom training to Storage?

353 views Asked by At

The following Python code generates a trained model called wine_mdl:

from sklearn import datasets, svm
import logging

logging.info('Loading a Scikit-learn toy dataset')
wine_df=datasets.load_wine(as_frame=True)

logging.info('Model training')
wine_mdl=svm.SVC(probability=True)
wine_mdl.fit(X=wine_df.data, y=wine_df.target)

# logging.info('Saving the trained model')

I want to run a custom job on Vertex AI Training:

  • without managed dataset
  • a scikit-learn pre-built container
  • a code packaged according to the Google Cloud documentation
  • a Google Storage output directory gs://my-bucket-for-vertexai/my-output-directory Vertex AI Training container configuration

Which line(s) of code could I add to the above Python code to export the trained model to the dedicated Google Storage repository ?

2

There are 2 answers

2
Nestor On

Can you try this? You will need to install Vertex AI SDK for Python:

To upload a model:

model = aiplatform.Model.upload( display_name='my-model', artifact_uri="gs://python/to/my/model/dir", serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-2:latest", )
2
Othmane El Omri On

You should use the environment variable AIP_MODEL_DIR to write your model to the output directory that you defined in the job according to this. Here's an example taken from here.

import os
import pickle

from google.cloud import storage
from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier()
classifier.fit(data, target)

artifact_filename = 'model.pkl'

# Save model artifact to local filesystem (doesn't persist)
local_path = artifact_filename
with open(local_path, 'wb') as model_file:
  pickle.dump(classifier, model_file)

# Upload model artifact to Cloud Storage
model_directory = os.environ['AIP_MODEL_DIR']
storage_path = os.path.join(model_directory, artifact_filename)
blob = storage.blob.Blob.from_string(storage_path, client=storage.Client())
blob.upload_from_filename(local_path)

Make sure to give the right name to your model. model.pkl or model.joblib is expected if you're using scikit-learn. Read more here.