I have a hadoop/yarn multi-node cluster on Ubuntu 22.04 and I have added GPU resources to the cluster following the hadoop instructions here: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html
When I ran the command, "yarn jar hadoop-yarn-applications-distributedshell.jar
-jar hadoop-yarn-applications-distributedshell.jar
-shell_command /usr/bin/nvidia-smi
-container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2
-num_containers 2"
it shows that the application was successful but there is not any nvidia-smi output. What could be causing this issue?
Im expecting to get something like this after running the application in YARN:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 0000:04:00.0 Off | 0 | | N/A 30C P0 24W / 250W | 0MiB / 12193MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... Off | 0000:82:00.0 Off | 0 | | N/A 34C P0 25W / 250W | 0MiB / 12193MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+