How to Reinstall or Downgrade NVIDIA Drivers on Bare Metal and Passthrough GPU Systems

NVIDIA drivers are required to enable GPU acceleration on Bare Metal machines, but issues such as version conflicts or failed upgrades can lead to performance degradation, instability, or NVML-related faults. When this happens, reverting to a reliable version or reinstalling the current driver may be necessary to restore normal GPU behavior.

Note
This walkthrough applies to both Bare Metal GPU servers and Passthrough GPU virtual machines created using GPU-enabled operating system images.
Bare Metal: Drivers run directly on dedicated physical hardware.
Passthrough: Drivers operate inside a virtual machine where the host assigns a physical GPU. These setups do not require Fabric Manager because initialization and GPU communication occur within the guest driver, while NVSwitch is controlled by the host.
If you install from a base OS, follow NVIDIA’s official instructions for your hardware. This guide does not apply to vGPU systems where driver versions are managed by the hypervisor. For more information about vGPU, refer to How to Manage vGPU on a Cloud GPU Instances.

Use this guide to fully remove, reinstall, or downgrade NVIDIA drivers on Bare Metal and Passthrough GPU environments to keep GPU workloads optimized.

Prerequisite

Before proceeding, ensure that you:

  • Have access to a GPU-enabled Bare Metal or Passthrough instance using a non-root account with sudo permissions.
  • Verify that no GPU-dependent workloads are active while modifying driver installations.

Install DKMS Package

The NVIDIA driver depends on the Dynamic Kernel Module Support (DKMS) framework to automatically rebuild modules whenever the system kernel updates. This guarantees that the driver continues functioning after kernel patches.

Update the package index.

console

Install the DKMS package.

$ sudo apt install -y dkms

Check the DKMS version.

The presence of a version number confirms that DKMS is installed correctly.

Remove Existing NVIDIA Drivers

Before reinstalling or downgrading, you must remove all current NVIDIA packages, including CUDA, to avoid conflicts with new installations.

Remove CUDA, cuBLAS, and Nsight packages.

$ sudo apt-get --assume-yes --purge remove "*cublas*" "cuda*" "nsight*"

Remove all NVIDIA driver packages and related libraries.

$ sudo apt-get --assume-yes --purge remove "*nvidia*"

Reboot the system to unload remaining driver components.

Configure the Official NVIDIA Repository

To install correct NVIDIA drivers, you must enable the official NVIDIA repo. This provides access to specific driver versions and ensures consistency with official releases.

Set the Ubuntu version.

console

$ UBUNTU_VERSION=$(lsb_release -rs | sed -e 's/\.//')

Download the NVIDIA keyring package.

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb

Install the keyring file.

$ sudo dpkg -i cuda-keyring_1.1-1_all.deb

Download the repository signing key.

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-archive-keyring.gpg

Move the key into the keyring directory.

$ sudo mv cuda-archive-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg

Add the CUDA repository.

$ echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-ubuntu${UBUNTU_VERSION}-x86_64.list

Update package index again.

Install Latest NVIDIA Drivers

The correct driver packages vary depending on GPU generation. The instructions below apply to both Bare Metal and Passthrough GPU systems since both provide direct GPU access.

B200 and Newer GPUs (Fabric Manager Not Required)

Install the NVIDIA open drivers, CUDA toolkit, and NVLink support libraries.

Install the open drivers, toolkit, and NVLink libraries.

$ sudo apt install --assume-yes nvidia-open cuda-toolkit nvlink5

Install the NVIDIA container runtime and supporting components.

$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1

Reboot to load the new drivers.

Verify installation using nvidia-smi.

H100 and Older GPUs (Fabric Manager Required)

These GPUs use NVLink and NVSwitch technologies, requiring Fabric Manager to enable full interconnect functionality.

Install CUDA drivers, Fabric Manager, and the CUDA toolkit.

$ sudo apt install --assume-yes cuda-drivers-fabricmanager cuda-toolkit

Install the NVIDIA container runtime and supporting components.

$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1

Reboot the system.

Check driver status.

Enable and launch the Fabric Manager service.

$ sudo systemctl enable --now nvidia-fabricmanager

Check Fabric Manager status.

$ sudo systemctl status nvidia-fabricmanager

Output:

● nvidia-fabricmanager.service – NVIDIA fabric manager service
Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled)
Active: active (running)
Main PID: 4811 (nv-fabricmanage)

Install Specific Versions of NVIDIA Drivers

NVIDIA offers multiple driver branches. Each GPU generation requires a minimum supported driver version:

  • HGX-2 / HGX A100 → minimum 450.xx
  • HGX H100 → minimum 525.xx
  • HGX B200 / B100 → minimum 570.xx

Attach the version number to the package name to install a specific branch:

  • nvidia-open-570: Installs open 570 driver branch.
  • cuda-drivers-550: Proprietary 550 branch for H100 GPUs.
  • cuda-toolkit-12-8: Installs CUDA toolkit version 12.8.

Note: NVIDIA does not maintain versioned branches for the container toolkit. To install a specific version:

$ sudo apt install nvidia-container-toolkit=VERSION

Example Installation Using 570 Drivers with CUDA 12.8

B200 and Newer GPUs

Install open 570 branch drivers with CUDA 12.8 and NVLink support.

Install driver, toolkit, and NVLink support.

$ sudo apt install --assume-yes nvidia-open-570 cuda-toolkit-12-8 nvlink5-570

Install the container runtime.

$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1

Reboot to activate drivers.

Check the GPU status.

H100 and Older GPUs

Install CUDA 570 driver branch with Fabric Manager and CUDA 12.8.

Install drivers, Fabric Manager, and toolkit.

$ sudo apt install --assume-yes cuda-drivers-fabricmanager-570 cuda-toolkit-12-8

Install the container runtime.

$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1

Reboot the system.

Verify that drivers are active.

Enable and start Fabric Manager.

$ sudo systemctl enable --now nvidia-fabricmanager

Check Fabric Manager status.

$ sudo systemctl status nvidia-fabricmanager

Output:

● nvidia-fabricmanager.service – NVIDIA fabric manager service
Loaded: loaded
Active: active (running)
Main PID: 4811 (nv-fabricmanage)

Conclusion

You have now successfully reinstalled or downgraded NVIDIA drivers on Bare Metal or Passthrough GPU environments and confirmed that the correct driver version is operational. For configurations requiring Fabric Manager, you ensured proper NVLink and NVSwitch functionality. With the drivers, CUDA toolkit, and container runtime in place, your system is ready for demanding GPU-based workloads and containerized GPU deployments.

Source: vultr.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: