Unable to Run NVIDIA GPU-Enabled Docker Containers Inside an LXC Container

578 views Asked by At

Question:

I am facing an issue when trying to run Docker containers that require GPU access within an LXC container. Standard Docker containers run fine, but when I try to use the NVIDIA GPU by adding --gpus=all or --runtime=nvidia, the container fails to start.

The error message I receive is:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.

Environment:

  • Bare-metal machine with Proxmox 8 (based on Debian 12) and NVIDIA driver installed (confirmed working with nvidia-smi)
  • LXC container also has NVIDIA driver working (nvidia-smi runs successfully)
  • Nvidia Quadro K420 (Low budget as this is my testing machine)
  • Driver Version: 470.199.02
  • CUDA Version: 11.4
  • The LXC container's configuration includes device pass-through for NVIDIA devices.

LXC Config:

# Allow cgroup access
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 239:* rwm
lxc.cgroup2.devices.allow: c 243:* rwm

# Pass through device files
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

What I've Tried:

  • Checked that both Docker and NVIDIA drivers are installed and working individually.
  • Made sure the LXC container is running in privileged mode and non-gpu docker containers are working

I am looking for any guidance on how to debug this issue and successfully run GPU-enabled Docker containers within an LXC container.

0

There are 0 answers