TensorFlow: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR but no other TF instances running

Question

TensorFlow: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR but no other TF instances running

7.3k views Asked by GhostRider At 02 September 2017 at 15:03

I am trying to run some basic transfer learning code using VGG16. I am using Ubuntu 16.04, TensorFlow 1.3 and Keras, and I have 4 1080ti GPUs.

When I get to this line of code:

 datagen = ImageDataGenerator(rescale=1. / 255)
 model = applications.VGG16(include_top=False, weights='imagenet')

The output of nvidia-smi shows this:

Processes:                                                       GPU Memory |
| GPU       PID  Type  Process name                                   Usage   

|    0     14241    G   /usr/lib/xorg/Xorg                             256MiB |
|    0     14884    G   compiz                                         155MiB |
|    0     16497    C   /home/simon/anaconda3/bin/python             10267MiB |
|    1     16497    C   /home/simon/anaconda3/bin/python             10611MiB |
|    2     16497    C   /home/simon/anaconda3/bin/python             10611MiB |
|    3     16497    C   /home/simon/anaconda3/bin/python             10611MiB |

+-----------------------------------------------------------------------------+

Then the output in terminal is

 2017-09-02 15:59:15.946927: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
 2017-09-02 15:59:15.946960: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
 2017-09-02 15:59:15.946973: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

And my jupyter notebook kernal dies.

Clearly this is a memory issue, but I don't understand why all of a sudden my GPUs are taken up by this bit of code. I should add that this problem only began in the last 24 hours and all of this code was running fine a day ago. There are many answers to similar problems here but they all refer to other instances of TF running (and suggest shutting them down). In my case, this is the only TF application running (or any other application).

Original Q&A

There are 3 answers

**yzhwang** · Answer 1 · 2017-09-05T19:19:43+00:00

That CHECK could fail because of reasons other than ShouldIncludeWinogradNonfusedAlgo(). For example if the cudnnSupport instance failed to get created, the CHECK would also fail. I'd suggest you post a more detailed issue on github and I can take a look. But updating CUDA driver and then reinstall cudnn can be the first thing to try. Basically to make sure that the CUDA and cudnn environment has not been changed recently. Also, a minimal reproducer is preferred if possible. Thank you!

**SpeedCoder5** · Answer 2 · 2018-03-16T20:19:59+00:00

SpeedCoder5 On 16 March 2018 at 20:19

Worked around by strickon here. I too was able to get it to work by choosing a percentage that worked, i.e. 0.7.:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.7
session = tf.Session(config=config, ...)

**Félix Fu** · Answer 3 · 2018-02-05T15:22:28+00:00

Félix Fu On 05 February 2018 at 15:22

Try killing all python processes, then delete ~/.nv folder and run it again. It worked for me having the same error.

TechQA.

TensorFlow: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR but no other TF instances running

There are 3 answers

Related Questions in TENSORFLOW

Related Questions in CUDNN

Popular Questions

Popular Tags

Trending Questions