Yarn Distributed-shell + GPU not showing nvidia-smi on output

53 views Asked by At

I have a hadoop/yarn multi-node cluster on Ubuntu 22.04 and I have added GPU resources to the cluster following the hadoop instructions here: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html

When I ran the command, "yarn jar hadoop-yarn-applications-distributedshell.jar
-jar hadoop-yarn-applications-distributedshell.jar
-shell_command /usr/bin/nvidia-smi
-container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2
-num_containers 2"

it shows that the application was successful but there is not any nvidia-smi output. What could be causing this issue?

Im expecting to get something like this after running the application in YARN:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 0000:04:00.0 Off | 0 | | N/A 30C P0 24W / 250W | 0MiB / 12193MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... Off | 0000:82:00.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 12193MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

0

There are 0 answers