Failed to install ROCm on Ubuntu 20.04

14k views Asked by At

I would like to set up AMD Radeon for Deep Learning on Ubuntu. The main libraries for my work are keras and pytorch. I followed strictly on ROCm installation guideline here but failed at the 3rd step with the command sudo apt install rocm-dkms. Error messages were shown as follows.

Setting up dkms (2.8.1-5ubuntu1) ...
Setting up hip-rocclr (4.0.20496.5685.40000-23) ...
Setting up rock-dkms (1:4.0-23) ...
Loading new amdgpu-4.0-23 DKMS files...
Building for 5.8.0-41-generic
Building for architecture x86_64
Building initial module for 5.8.0-41-generic
Error! Bad return status for module build on kernel: 5.8.0-41-generic (x86_64)
Consult /var/lib/dkms/amdgpu/4.0-23/build/make.log for more information.
dpkg: error processing package rock-dkms (--configure):
 installed rock-dkms package post-installation script subprocess returned error 
exit status 10
Setting up g++-9 (9.3.0-17ubuntu1~20.04) ...
Setting up g++ (4:9.3.0-1ubuntu2) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mo
de
Setting up build-essential (12.8ubuntu1.1) ...
dpkg: dependency problems prevent configuration of rocm-dkms:
 rocm-dkms depends on rock-dkms; however:
  Package rock-dkms is not configured yet.

dpkg: error processing package rocm-dkms (--configure):
 dependency problems - leaving unconfigured
Setting up gcc-multilib (4:9.3.0-1ubuntu2) ...
No apport report written because the error message indicates its a followup erro
r from a previous failure.
                          Setting up g++-9-multilib (9.3.0-17ubuntu1~20.04) ...
Setting up g++-multilib (4:9.3.0-1ubuntu2) ...
Processing triggers for sgml-base (1.29.1) ...
Setting up x11proto-dev (2019.2-1ubuntu1) ...
Setting up libxau-dev:amd64 (1:1.0.9-0ubuntu1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
Processing triggers for man-db (2.9.1-1) ...
Setting up libxdmcp-dev:amd64 (1:1.1.3-0ubuntu1) ...
Setting up x11proto-core-dev (2019.2-1ubuntu1) ...
Setting up libxcb1-dev:amd64 (1.14-2) ...
Setting up libx11-dev:amd64 (2:1.6.9-2ubuntu1.1) ...
Setting up libglx-dev:amd64 (1.3.2-1~ubuntu0.20.04.1) ...
Setting up libgl-dev:amd64 (1.3.2-1~ubuntu0.20.04.1) ...
Setting up mesa-common-dev:amd64 (20.2.6-0ubuntu0.20.04.1) ...
Setting up rocm-opencl-dev (3.6Beta-17-g875c1f8-rocm-rel-4.0-23) ...
Settin XT g up rocm-clang-ocl (0.5.0.64-rocm-rel-4.0-23-50fb51a) ...
Setting up rocm-utils (4.0.0.40000-23) ...
Setting up rocm-dev (4.0.0.40000-23) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
Errors were encountered while processing:
 rock-dkms
 rocm-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

My kernel version is 5.8.0-41-generic. My VGA card is Gigabyte Radeon RX6900 XT. My CPU is AMD Ryzen 9 3900 XT. I tried several solutions suggested in previous posts but it did not solve my problem. May I have your suggestions to fix this.

2

There are 2 answers

1
Anthrac1t3 On BEST ANSWER

I've been having the same issue as well. The only way I found to fix it is to roll back to the 5.6.0-1042-oem kernel. The AMD drivers don't seem to support any kernel past this one.

Edit: This is also a way to get the amdgpupro drivers to install without a problem.

WARNING: I'm writing all this after the fact and i might have missed a step or something along the way. Please be very careful especially with trying to remove kernels and when working in your boot directory. If you're uncomfortable with the idea of wrecking your system you can always set grub's default selection which is a lot safer than removing an initramfs.

Here's how I got RocM working

sudo apt install linux-image-5.6.0-1042-oem linux-headers-5.6.0-1042-oem && reboot

Make sure you boot into the 5.6 kernel by accessing the Ubuntu advanced options in grub.

sudo apt remove linux-image-5.8.0-41-generic linux-headers-5.8.0-41-generic && sudo apt autoremove && reboot

Again you'll have to reboot into 5.6 through the advanced options. (Hold the shift key after BIOS finishes loading to get the Ubuntu Advanced Options menu.) After you're back in it's a good idea to set your headers and image as held back because a kernel update will most likely break RocM.

sudo apt-mark hold linux-image-generic linux-headers-generic

Now we're going to try and flush out the 5.8 kernel. Start by flushing out the temporary files.

sudo rm -rv ${TMPDIR:-/var/tmp}/mkinitramfs-*

Now list all of the kernels installed.

dpkg -l | tail -n +6 | grep -E 'linux-image-[0-9]+'

And try to remove the 5.8 kernel. Do this for any kernel you have above the 5.6 one we installed.

sudo update-initramfs -d -k 5.8.0-41-generic

Now the initramfs, Systemmap, and config are still present in the boot dir so we need to clear those out to get grub working properly again.

cd /boot/
sudo rm vmlinuz-5.8.0-41-generic System.map-5.8.0-41-generic config-5.8.0-41-generic

Now you should be finally ready to update grub

sudo update-grub && reboot

Now when you load back in you should be able to install RocM

sudo apt install rocm-dkms
0
shaswat.dharaiya On

As per the official notes in this link, AMD ROCm platform is designed to support Ubuntu 20.04.1 (5.4 and 5.6-oem) and 18.04.5 (Kernel 5.4).

So kernel version 5.8 is not supported. However, downgrading is an option but instead of rushing to that, you can simply boot into an older version of kernel.

Try following steps:

  1. Restart your computer,
  2. Wait for the grub menu to open (how to open grub menu: link).
  3. Select advanced options for ubuntu
  4. Select an alternate kernal from the list shown.