Learn Docker GPU on Ubuntu 24


Verify that you have a CUDA-capable GPU

lspci | grep -i nvidia

if you have a nvidia gpu you should see something like this

01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)

Clean your docker leftovers

sudo snap remove --purge docker # removes docker without making snapshots
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get purge docker-ce docker-ce-cli containerd.io

Remove nvidia-container-runtime

sudo apt-get remove nvidia-container-toolkit

Install nvidia drivers

sudo apt install nvidia-driver-550-server

Install docker using convience script

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

Change docker group to allow user to run docker without sudo

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Verify that you can run docker without sudo

docker run hello-world

Install nvidia-container-runtime

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit

Configure docker to use nvidia-container-runtime

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Configure nvidia rootless

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
sudo systemctl restart docker

REBOOT

sudo reboot

Verify that you can run docker with nvidia runtime

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Troubleshooting

Failed to initialize NVML: Unknown Error

disable cgroups in nvidia-container-runtime

nano /etc/nvidia-container-runtime/config.toml

# change the following line
no-cgroups = true

Restart docker

sudo systemctl restart docker