I had tensorflow 2.2 working with Python 3.7.4 on Windows 10 Enterprise 64-bit yesterday, including using the GPU. This morning, the same system no longer sees the GPU. I have uninstalled/reinstalled CUDA, & the other requirements based on the tensorflow docs but it just refuses to work.
PC specs: i7 CPU 3.70GHz, 64GB RAM, NVidia GeForce GTX 780 Ti video card installed (driver 26.21.14.4122).
https://www.tensorflow.org/install/gpu says tensorflow requires NVidia CUDA Toolkit 10.1 specifically (not 10.0, not 10.2).
Naturally, that version refuses to install on my PC. these components fail during install:
- Visual Studio Integration
- NSight Systems
- NSight Compute
So, I installed 10.2 which installs properly, but things don't run (which is not a surprise, given the tensorflow docs).
What's installed:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.22 Driver Version: 441.22 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 780 Ti WDDM | 00000000:01:00.0 N/A | N/A |
| 27% 41C P8 N/A / N/A | 458MiB / 3072MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89
I know the nvcc output of 10.2.89 is not what I need, but it simply won't install 10.1 so I don't know what I can do. Is this a common problem? Is there a diagnostic I can run to ensure the card did not die? Should I downgrade my version of tensorflow? Should I abandon this environment all together? Is so, what is a stable environment to learn ML?
Below is how I got it working. Tensorflow 2.2.0, Windows 10, Python 3.7 (64-bit). Thanks again to Yahya for the gentle nudge towards this solution.
Uninstall every bit of NVIDIA software.
Install CUDA Toolkit 10.1. I did the Express Install of package cuda_10.1.243_win10_network.exe. Any other version of CUDA 10.1 did not install correctly.
Install CUDNN package 7.6. Extract all files from cudnn-10.1-windows10-x64-v7.6.5.32 into the CUDA file structure (i.e. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1)
Add these directories to your path variables (assuming that you did not alter the path during installation):
Reboot to initialize the Path variables.
Uninstall all tensorflow variants via PIP.
Install tensorflow 2.2 via PIP.
Then you can run the code below in bash to confirm that tensorflow is able to access your video card