Running podman-compose with Nvidia GPU on Gentoo (systemd)

Running podman-compose with NVIDIA GPU support on Gentoo can be a powerful way to manage containerized applications while leveraging GPU acceleration for tasks like machine learning, video transcoding or graphics rendering. However, setting this up requires careful configuration of both the GPU driver and podman-compose services to ensure seamless integration. This guide will walk you through the process of enabling GPU support in podman-compose on Gentoo, step by step, so you can fully utilize your system's capabilities.

 

Table of Contents

In no event, unless required by applicable law or agreed to in writing will I be liable to you for damages, including any general, special, incidental, or consequential damages arising out of the use or inability to use the information, commands, scripts and snippets provided here (including but not limited to loss of data or data being rendered inaccurate, or losses sustained by you or third parties, or a failure of the command/script/snippets to operate with any other programs), even if such holder or other party has been advised of the possibility of such damages.

Writing articles like this one requires time and resources. If you found it helpful or even if you didn't, I'd love to hear from you—whether you have feedback, suggestions, or spotted any bugs or typos. Your input would mean the world to me! You can reach out using the email address listed in the imprint.

 

Installation

make.conf

Configure your system for usage with a NVIDIA GPU in /etc/portage/make.conf:

VIDEO_CARDS="nvidia"

Accept keywords

ACCEPT_KEYWORDS in Gentoo specify if the package manager is allowed to accept testing version or if only stable versions should be used.

Configure the following keywords as needed in /etc/portage/package.accept_keywords. This depends on the current state of the repository as well as your needs regarding being stable or running bleeding edge.

app-containers/nvidia-container-toolkit
app-containers/podman-compose

USE flags

USE flags in Gentoo are keywords that define support, features and dependencies. They allow to configure how packages are built and installed by portage.

At the time of writing this are the recommended use flags for the most recent version of x11-drivers/nvidia-drivers. Edit your /etc/portage/package.use file accordingly:

*/*                        nvidia
x11-drivers/nvidia-drivers kernel-open

Emerging the packages

Compile and install the following packages:

sudo emerge -a &&\
  x11-drivers/nvidia-drivers &&\
  app-containers/nvidia-container-toolkit &&\
  app-containers/podman &&\
  app-containers/podman-compose

nvtop can be a good tool to check your GPUs. NVIDIA CUDA images for podman can also be quite useful for testing if everything works as intended. 

nvidia-container-toolkit

Container Device Interface (CDI) is a specification for container runtimes like podman. It standardizes access to devices like the GPU by the container runtime.

If you list the NVIDIA devices in /dev - directly after the install - something similar to this (the output will vary depending on your GPU count):

> ls -l /dev/nv*
/dev/nvidia0
/dev/nvidia1
/dev/nvidia2
/dev/nvidia3
/dev/nvidia4 
/dev/nvidia5
/dev/nvidia6
/dev/nvidia7 
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvram

If you do a restart and do it again you will find that most of these devices are not there anymore. This is a problem, because these CDI devices are needed for the nvidia-container-toolkit to function with the container runtime (podman).

Directly after the install the setup script will run the necessary command to generate the CDI devices, but those are not persistent after a reboot... so we need to make sure that these are generated on boot.

I opted to set up a systemd service unit for this:

sudo nvim /etc/systemd/system/nvidia-cdi.service

Add the following service unit:

[Unit]
Description=Generate NVIDIA CDI configuration

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-ctk cdi generate --output="/etc/cdi/nvidia.yaml"

[Install]
WantedBy=multi-user.target

Enable the service and start the generation:

sudo systemctl daemon-reload
sudo systemctl enable --now nvidia-cdi.service

After a restart you will see that the necessary devices were created.

podman-compose

podman-compose.yaml

The next step varies depending on the service/container you want to provide, so I will give a basic example.

First you need to login as the user which should run the container (depending if you go rootless or root). I will go rootless here and use a dummy user called dummy.

mkdir -p ~/pods/podA
nvim ~/pods/podA/podman-compose.yaml

Next you have to define the service as well as allow it access to the GPU:

---
services:
  containerA:
    image: some/image
    restart: unless-stopped
    networks:
      - backend
    volumes:
      - volume:/some/path

    ############################# RELEVANT START
    devices:
      - nvidia.com/gpu=all
    ############################# RELEVANT STOP
...

The relevant part that you can spot here is listed under devices. Notice the difference to a corresponding docker-compose.yaml file in which you would provide something along these lines (also see docker docs):

---
services:
  containerA:
    image: some/image
    restart: unless-stopped
    networks:
      - backend
    volumes:
      - volume:/some/path

    ############################# RELEVANT START
     deploy:
       resources:
         reservations:
           devices:
             - driver: nvidia
               count: 8
               capabilities: [gpu]
    ############################# RELEVANT STOP
...

NVIDIA GPU in rootless container

To use NVIDIA in a rootless container in Gentoo we need some extra step. Edit the file ~/.config/containers/containers.conf and add the following settings to enable sharing user groups inside containers (from Gentoo Wiki):

[containers]
annotations=["run.oci.keep_original_groups=1",]

If you want more background information, check RedHat and man crun.

Automatically start container on boot

Setting up systemd units for user

To automatically start the pod (containers) on boot we need to set up the podman-compose unit files first:

sudo podman-compose systemd -a create-unit

You can check if the service podman-compose@.service now exists for you user after reloading the systemd daemon:

systemctl --user daemon-reload
systemctl --user list-unit-files | grep podman-compose

Register compose stack for user

Next we register the compose stack for the user. Go to your project directory:

cd ~/pods/podA

Now register the compose stack with podman-compose:

podman-compose systemd -a register

You can check how it was set up and if it was successful:

> ls ~/.config/containers/compose/projects/podA.env

> bat ~/.config/containers/compose/projects/mind.env
───────┬──────────────────────────────────────────────────────────────────
       │ File: /home/dummy/.config/containers/compose/projects/mind.env
───────┼──────────────────────────────────────────────────────────────────
   1   │ COMPOSE_PROJECT_DIR=/home/dummy/pods/podA
   2   │ COMPOSE_FILE=podman-compose.yaml
   3   │ COMPOSE_PATH_SEPARATOR=:
   4   │ COMPOSE_PROJECT_NAME=podA
───────┴──────────────────────────────────────────────────────────────────

Customize the service

Now you could manage you pod with systemd already:

systemctl --user daemon-reload
systemctl --user start 
systemctl --user stop 

But to prevent possible problem with a race condition on the computer start we want to make sure that the service only starts after die CDI files were already generate by the service we created above. So we will modify the systemd service unit with an override:

systemctl --user edit 

We will make the following override:

### Editing /home/dummy/.config/systemd/user/.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Unit]
After=nvidia-cdi.service

### Edits below this comment will be discarded


### /etc/xdg/systemd/user/podman-compose@.service
# # /etc/systemd/user/podman-compose@.service
# 
# [Unit]
# Description=%i rootless pod (podman-compose)
# 
# [Service]
# Type=simple
# EnvironmentFile=%h/.config/containers/compose/projects/%i.env
# ExecStartPre=-/usr/lib/python-exec/python3.12/podman-compose up --no-start
# ExecStartPre=/usr/bin/podman pod start pod_%i
# ExecStart=/usr/lib/python-exec/python3.12/podman-compose wait
# ExecStop=/usr/bin/podman pod stop pod_%i
# 
# [Install]
# WantedBy=default.target

This makes sure that the service is only started after nvidia-cdi.service has already run. If you want, you can also make an addition to the nvidia-cdi.service and add Before= under the description, but this should not be necessary.

So we have now the system-wide service unit for generating the CDI devices on startup. After that is finished our user service unit will rootless start the pod (the containers).

Enable lingering

Because we want to start the containers even when the computer has started, but the user hasn't logged in yet, we enable lingering for our dummy user:

loginctl enable-linger dummy

You can check if it is enabled by issuing:

loginctl list-users

Enable the user service

Now we can enable the user service and test our setup:

systemctl --user daemon-reload
systemctl --user enable --now 
systemctl --user status 

After a restart you can check the status of you pod with one of the following possibilities:

podman pod ls

podman pod stats 'pod_podA'

podman pod logs --tail=10 -f 'pod_podA'

cd ~/pods/podA &&\
  podman-compose ps