NVIDIA Docker - initialization error: nvml error: driver not loaded

22.6k views Asked by At

I'm a complete newcomer to Docker, so the following questions might be a bit naive, but I'm stuck and I need help.

I'm trying to reproduce some results in research. The authors just released code along with a specification of how to build a Docker image to reproduce their results. The relevant bit is copied below:

enter image description here

I believe I installed Docker correctly:

$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ sudo docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

However, when I try checking that my nvidia-docker installation was successful, I get the following error:

$ sudo docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown.

It looks like the key error is:

nvidia-container-cli: initialization error: nvml error: driver not loaded

I don't have a GPU locally and I'm finding conflicting information on whether CUDA needs to be installed before NVIDIA Docker. For instance, this NVIDIA moderator says "A proper nvidia docker plugin installation starts with a proper CUDA install on the base machine."

My questions are the following:

  1. Can I install NVIDIA Docker without having CUDA installed?

  2. If so, what is the source of this error and how do I fix it?

  3. If not, how do I create this Docker image to reproduce the results?

2

There are 2 answers

4
anemyte On BEST ANSWER
  1. Can I install NVIDIA Docker without having CUDA installed?

Yes, you can. The readme states that nvidia-docker only requires NVIDIA GPU driver and Docker engine installed:

Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed

  1. If so, what is the source of this error and how do I fix it?

That's either because you don't have a GPU locally or it's not NVIDIA, or you messed up somewhere when installed drivers. If you have a CUDA-capable GPU I recommend using NVIDIA guide to install drivers. If you don't have a GPU locally, you can still build an image with CUDA, then you can move it somewhere where there is a GPU.

  1. If not, how do I create this Docker image to reproduce the results?

The problem is that even if you manage to get rid of CUDA in Docker image, there is software that requires it. In this case fixing the Dockerfile seems to me unnecessary - you can just ignore Docker and start fixing the code to run it on CPU.

0
lukbinx On

I think you need

ENV NVIDIA_VISIBLE_DEVICES=void

then

RUN your work

finally

ENV NVIDIA_VISIBLE_DEVICES=all