Google Cloud VM cuda not available anymore

337 views Asked by At

I'm working in GoogleCloud and I have create a virtual machine with the following specs:

  • Machine: a2-highgpu-1g
  • CPU platfor: Intel Cascade Lake
  • GPU: 1 x NVIDIA A100 40GB

I use this machine to train and test different RNN models and it was working fine till last friday ( 8th of September 2023 ) and today suddenly my models are not able to use the GPU anymore. If i run

torch.cuda.is_available()

the result is false. Someone could give me sime hints to what could be happened since the last usage since the GPU is not available anymore? Thanks.

Edit: I have used it since Friday but then for the weekend I kept the VM on but never used it. Maybe they restricted my account because I was occupying a machine without using it?

Edit 2: I notice that the command: lshw -class display returns:

  *-display UNCLAIMED       
       description: 3D controller
       product: GA100 [A100 SXM4 40GB]
       vendor: NVIDIA Corporation
       physical id: 4
       bus info: pci@0000:00:04.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: msix pm bus_master cap_list
       configuration: latency=0
       resources: iomemory:200-1ff iomemory:300-2ff memory:80000000-80ffffff memory:2000000000-2fffffffff memory:3000000000-3001ffffff

Surfing on internet I found that "display UNCLAIMED" means that I do not have the proper driver. Is this right? Should I upgrade manually the driver on a GCloud VM?

Thanks again

1

There are 1 answers

0
Dion V On

Yes you can try to manually download the CUDA toolkit and pre-installation according to the instruction provided. Attaching the documentation for your reference