Introduction
In this step-by-steps guide we will replace the out of the box nouveau drivers on RHEL9 with Nvidia Drivers. We will also install the the Nvidia CUDA Toolkit and the Nvidia Container Toolkit.
GPU and Driver Inspection
First we need to make sure that our Nvdia GPU is recognized by Red Hat Enterprise Linux 9 (RHEL9).
lspci -nn | grep -i nvidia
b6:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A40] [10de:2235] (rev a1)
Using the command below we can see that we are currently using the non-propietary nouveau driver.
# lspci | grep ' NVIDIA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}
b6:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
Subsystem: NVIDIA Corporation Device 145a
Flags: bus master, fast devsel, latency 0, IRQ 32, NUMA node 0
Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
Memory at 38d000000000 (64-bit, prefetchable) [size=64G]
Memory at 38f040000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] Null
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [c8] MSI-X: Enable- Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
Capabilities: [d00] Lane Margining at the Receiver <?>
Capabilities: [e00] Data Link Feature <?>
Kernel driver in use: nouveau
Kernel modules: nouveau
Configuring Repositories for the Nvidia Driver Install
First we need to enable the RHEL9 CodeReady Builder repo. Note we are running these commands as root.
# subscription-manager repos --enable codeready-builder-for-rhel-9-$(uname -i)-rpms
Next we will need to install and configure the EPEL repo.
# dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
Now we install the ELRepo project repo – this will provide nvidia-detect which we can utilize later
# dnf -y install https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
Prerequisites for Nvidia Driver Install
Now we need to install dependencies and build tools.
# dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make dkms acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig
Install Nvidia Drivers
Install nvidia-detect from the ELRepo project repo.
# dnf -y install nvidia-detect
Now install the Nvidia Drivers.
# dnf -y install $(nvidia-detect)
Now reboot.
Confirming Nvidia Driver Installation
Now lets run the command below one more time.
[root@gpu ~]# lspci | grep ' NVIDIA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}
b6:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
Subsystem: NVIDIA Corporation Device 145a
Flags: bus master, fast devsel, latency 0, IRQ 32, NUMA node 0
Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
Memory at 38d000000000 (64-bit, prefetchable) [size=64G]
Memory at 38f040000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] Null
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [c8] MSI-X: Enable- Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
Capabilities: [d00] Lane Margining at the Receiver <?>
Capabilities: [e00] Data Link Feature <?>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
As you can see in the output below, the kernel is loading the Nvidia driver. We can still see nouveau kernel modules listed, but that is fine, as they are not loaded. We can confirm this with the command below.
# lsmod | grep nouveau
The above command should not output anything, while the opposite should be true for the command below.
# lsmod | grep nvidia
Configure Nvidia Persistenced
Start and enable nvidia-persistenced.service. This will enable persistence-mode which will keep the nvidia device state from going “stale”
# systemctl enable nvidia-persistenced.service
# systemctl start nvidia-persistenced.service
Installing the Nvidia CUDA Toolkit
We will now follow the official guide and install the Nvidia CUDA toolkit. Per that guide, we need to enable a few repos, however two of those repos should be enabled by default, and the other one we enabled above, however I will list them here for the sake of documentation.
# subscription-manager repos --enable=rhel-9-for-x86_64-appstream-rpms
# subscription-manager repos --enable=rhel-9-for-x86_64-baseos-rpms
# subscription-manager repos --enable=codeready-builder-for-rhel-9-x86_64-rpms
Now we install the Nvidia repo for the CUDA toolkit.
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
Now install the CUDA toolkit as shown below
# sudo dnf -y install cuda-toolkit
Confirm that the toolkit is installed and note the version.
# rpm -qa cuda-toolkit
cuda-toolkit-12.8.1-1.x86_64
Add the following to your .bashrc. And if you intend to run/install anything as root, you may want to add it to root’s .bashrc as well. Note that the cuda version should match the one that you installed above.
export PATH=/usr/local/cuda-12.8/bin:$PATH
Now test nvcc as shown below.
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
Installing the Nvidia Container Toolkit
Next we will install the Nvidia Container Toolkit, which allows users to run GPU-accelerated containerized applications.
A bit about Container Management in RHEL 9
The default container packages in RHEL 9 are as follows.
- Podman – daemonless container image
- Buildah – tool for building OCI (Open Container Initiative) container images
- Skopeo – tool for managing container images and repos
- CRIU – tool to create and save running container checkpoints to disk
- Udica – tool for managing SELinux policies for containers
Installation of the toolkit
We will follow the instructions as documented here.
First we configure the repo
# curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
Then install via dnf
# dnf install -y nvidia-container-toolkit
Configuring the Container Toolkit for Podman
Generate the CDI specification file using the command below.
# nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Now lets check the names of the generated device(s).
# nvidia-ctk cdi list
INFO[0000] Found 3 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=GPU-7e880be2-891c-72e3-9515-0fd51240e7f4
nvidia.com/gpu=all
References
- https://medium.com/@blackhorseya/step-by-step-guide-to-installing-nvidia-drivers-on-rhel-9-1107e0cd641d
- https://access.redhat.com/discussions/227d2101-b4e3-490a-aa1c-601c407ec038
- https://darryldias.me/2022/install-nvidia-drivers-on-rhel-9/
- https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-rhel-rocky
- https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html