I have a cloudformation template that I have created in hopes to spin up an ec2 instance with the necessary dependencies (where these dependencies are installed as bash in UserData) to leverage GPU hardware within a docker container. The main dependencies are: 1) nvidia drivers, 2) docker, and 3) nvidia-docker2.
The first two dependencies install as expected and after several moments of running can be verified by 1) nvidia-smi
, and docker --version
. The third dependency however consistently does not install.
For reference here are the relevant parts of my UserData bash:
# install gpu stuff
apt-get install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
apt-get update
apt-get -y install cuda-drivers
# install docker on system
curl https://get.docker.com | sh
systemctl start docker && systemctl enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get -y install nvidia-docker2 > /var/log/mason
# add nvidia runtime stuff
# echo "{ \"runtimes\": { \"nvidia\": { \"path\": \"/usr/bin/nvidia-container-runtime\", \"runtimeArgs\": [] } } }" >> /etc/docker/daemon.json
systemctl restart docker
I have tried to pipe the stdout from apt-get -y install nvidia-docker2
to a log file but the logs only show:
Reading package lists...
Building dependency tree...
Reading state information...
and seems to be stuck there.
Other potential helpful bits:
- AMI: ubuntu 18.04 image
I will also note that I am able to SSH into the instance and install the apt-get -y install nvidia-docker2
in the command terminal without a hitch (or any user prompt or anything).
Can anyone help me figure out how to trouble shoot this issue or does anyone see any potential problems in what I have shared above? The stdout pipe to file is about the only trick I know to debug such an issue as this. Please let me know if I can update/edit this post to make this issue easier to debug.
Based on the comments.
The issue was caused by not updating ubuntu's repositories after adding
repo.The solution was to run
apt-get update
after the addition of the repo.