I am trying to use the images found here to deploy a VM to GCP's Compute Engine with a GPU enabled. I have successfully created a VM from a publicly available NVIDIA image (e.g. nvidia-gpu-cloud-image-2022061
from the nvidia-ngc-public
project) to create a VM, but the VM forces a prompt to install drivers upon being started. So, I have to SSH into the VM to manually install the GPU drivers by answering 'y' to the install drivers prompt. It will then install the drivers.
My issue is that I need to automate this GPU driver installation process so that I can cleanly and deterministically (fixed driver version) create these images with drivers installed via CI/CD pipelines. What is the best way to achieve this automation? I would like to avoid creating my own base image and installing all the drivers/dependencies if possible.
I have created a VM with this image using the following command:
gcloud compute instances create $INSTANCE_NAME --project=$PROJECT --zone=$ZONE --machine-type=n1-standard-16 \--maintenance-policy=TERMINATE --network-interface=network-tier=PREMIUM, subnet=default --service-account=my-service-account@$PROJECT.iam.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --accelerator=count=1,type=nvidia-tesla-t4 --image=nvidia-gpu-cloud-image-2022061 --image-project=nvidia-ngc-public --boot-disk-size=200 --boot-disk-type=pd-standard --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any --no-restart-on-failure
I have then SSH'd into the VM and answered yes to the prompt.
I have then saved the image using gcloud compute images create --source-disk $INSTANCE_NAME
for future use.
How can I automate this cleanly?
You can use scripts to automate the installation process. To review these scripts, see the GitHub repository: https://github.com/GoogleCloudPlatform/compute-gpu-installation