I am having problems executing a simple Tensorflow model that worked well yesterday. I suspect, the problem in its entirety relates to the error given
Blas GEMM launch failed
In the console it says,
tensorflow/core/common_runtime/gpu/gpu_util.cc:343] CPU->GPU Memcpy failed
My impression is that this may relate to my CUDA installation based on this
TensorFlow: Blas GEMM launch failed
however, I can't see how to run the simpleCUBLAS examples. I am completely new to CUDA.
I have 4 1080ti GPUs (Ubuntu 16.04, TensorFlow 1.3.0) and I have not identified any zombie processes taking up GPU memory. Any help is greatly appreciated.
So I found the answer after days of going mad. I first ran this
I did this:
to check my CUBLAS installation. It returned CUBLAS INITIALIZATION FAILED!!!
So next I did this (based on advice)
And it worked. Hope this saves someone else. Seems easy when you see it.
The other thing that is worth mentioning is that this problem also threw this error occasionally:
This was cryptic - everybody suggested it was a memory issue and sure enough, my GPUs got hogged by python during the initiation of my TF model. But it was the CUBLAS error that led me to the solution.