I am trying to train a spaCy model , but turning the code into a Vertex AI Pipeline Component. My current code is:
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="train.yaml"
)
def train(train_name: str, dev_name: str) -> NamedTuple("output", [("model_path", str)]):
"""
Trains a spacy model
Parameters:
----------
train_name : Name of the spaCy "train" set, used for model training.
dev_name: Name of the spaCy "dev" set, , used for model training.
Returns:
-------
output : Destination path of the saved model.
"""
import spacy
import subprocess
spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE
# NOTE: The remaining code has already been tested and proven to be functional.
# It has been edited since the project is private.
# Presets for training
subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])
# Training model
location = "gcs/secret_model_destination_path/TestModel"
subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
"--output", location,
"--paths.train", "gcs/secret_bucket/secret_path/{}.spacy".format(train_name),
"--paths.dev", "gcs/secret_bucket/secret_path/{}.spacy".format(dev_name),
"--gpu-id", "0"])
return (location,)
The Vertex AI Logs display the following as main cause of the failure:
The libraries are successfully installed, and yet I feel like there is some missing library / setting (as I know by experience); however I don't know how to make it "Python-based Vertex AI Components Compatible". BTW, the use of GPU is mandatory in my code.
Any ideas?
After some rehearsals, I think I have figured out what my code was missing. Actually, the
train
component definition was correct (with some minor tweaks relative to what was originally posted); however the pipeline was missing the GPU definition. I will first include a dummy example code, which trains a NER model using spaCy, and orchestrates everything via Vertex AI Pipeline:Now, the explanation; according to Google:
Therefore, when the
train
component gets compiled, it fails as "it was not seeing any GPU available as resource"; in the same link however, all the available settings for both CPU and GPU are mentioned. In my case as you can see, I settrain
component to run under ONE (1)NVIDIA_TESLA_T4
GPU card, and I also increased my CPU memory, to 32GB. With these modifications, the resulting pipeline looks as follows:And as you can see, it gets compiled successfully, as well as trains (and eventually obtains) a functional spaCy model. From here, you can tweak this code, to fit your own needs.
I hope this helps to anyone who might be interested.
Thank you.