How to Reinstall or Downgrade NVIDIA Drivers on Bare Metal and Passthrough GPU Systems
NVIDIA drivers are required to enable GPU acceleration on Bare Metal machines, but issues such as version conflicts or failed upgrades can lead to performance degradation, instability, or NVML-related faults. When this happens, reverting to a reliable version or reinstalling the current driver may be necessary to restore normal GPU behavior.
Note
This walkthrough applies to both Bare Metal GPU servers and Passthrough GPU virtual machines created using GPU-enabled operating system images.
Bare Metal: Drivers run directly on dedicated physical hardware.
Passthrough: Drivers operate inside a virtual machine where the host assigns a physical GPU. These setups do not require Fabric Manager because initialization and GPU communication occur within the guest driver, while NVSwitch is controlled by the host.
If you install from a base OS, follow NVIDIA’s official instructions for your hardware. This guide does not apply to vGPU systems where driver versions are managed by the hypervisor. For more information about vGPU, refer to How to Manage vGPU on a Cloud GPU Instances.
Use this guide to fully remove, reinstall, or downgrade NVIDIA drivers on Bare Metal and Passthrough GPU environments to keep GPU workloads optimized.
Prerequisite
Before proceeding, ensure that you:
- Have access to a GPU-enabled Bare Metal or Passthrough instance using a non-root account with sudo permissions.
- Verify that no GPU-dependent workloads are active while modifying driver installations.
Install DKMS Package
The NVIDIA driver depends on the Dynamic Kernel Module Support (DKMS) framework to automatically rebuild modules whenever the system kernel updates. This guarantees that the driver continues functioning after kernel patches.
Update the package index.
console
$ sudo apt update
Install the DKMS package.
$ sudo apt install -y dkms
Check the DKMS version.
$ dkms --version
The presence of a version number confirms that DKMS is installed correctly.
Remove Existing NVIDIA Drivers
Before reinstalling or downgrading, you must remove all current NVIDIA packages, including CUDA, to avoid conflicts with new installations.
Remove CUDA, cuBLAS, and Nsight packages.
$ sudo apt-get --assume-yes --purge remove "*cublas*" "cuda*" "nsight*"
Remove all NVIDIA driver packages and related libraries.
$ sudo apt-get --assume-yes --purge remove "*nvidia*"
Reboot the system to unload remaining driver components.
$ sudo reboot
Configure the Official NVIDIA Repository
To install correct NVIDIA drivers, you must enable the official NVIDIA repo. This provides access to specific driver versions and ensures consistency with official releases.
Set the Ubuntu version.
console
$ UBUNTU_VERSION=$(lsb_release -rs | sed -e 's/\.//')
Download the NVIDIA keyring package.
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
Install the keyring file.
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
Download the repository signing key.
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-archive-keyring.gpg
Move the key into the keyring directory.
$ sudo mv cuda-archive-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg
Add the CUDA repository.
$ echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-ubuntu${UBUNTU_VERSION}-x86_64.list
Update package index again.
$ sudo apt update
Install Latest NVIDIA Drivers
The correct driver packages vary depending on GPU generation. The instructions below apply to both Bare Metal and Passthrough GPU systems since both provide direct GPU access.
B200 and Newer GPUs (Fabric Manager Not Required)
Install the NVIDIA open drivers, CUDA toolkit, and NVLink support libraries.
Install the open drivers, toolkit, and NVLink libraries.
$ sudo apt install --assume-yes nvidia-open cuda-toolkit nvlink5
Install the NVIDIA container runtime and supporting components.
$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot to load the new drivers.
$ sudo reboot
Verify installation using nvidia-smi.
$ nvidia-smi
H100 and Older GPUs (Fabric Manager Required)
These GPUs use NVLink and NVSwitch technologies, requiring Fabric Manager to enable full interconnect functionality.
Install CUDA drivers, Fabric Manager, and the CUDA toolkit.
$ sudo apt install --assume-yes cuda-drivers-fabricmanager cuda-toolkit
Install the NVIDIA container runtime and supporting components.
$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system.
$ sudo reboot
Check driver status.
$ nvidia-smi
Enable and launch the Fabric Manager service.
$ sudo systemctl enable --now nvidia-fabricmanager
Check Fabric Manager status.
$ sudo systemctl status nvidia-fabricmanager
Output:
● nvidia-fabricmanager.service – NVIDIA fabric manager service
Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled)
Active: active (running)
Main PID: 4811 (nv-fabricmanage)
Install Specific Versions of NVIDIA Drivers
NVIDIA offers multiple driver branches. Each GPU generation requires a minimum supported driver version:
- HGX-2 / HGX A100 → minimum 450.xx
- HGX H100 → minimum 525.xx
- HGX B200 / B100 → minimum 570.xx
Attach the version number to the package name to install a specific branch:
- nvidia-open-570: Installs open 570 driver branch.
- cuda-drivers-550: Proprietary 550 branch for H100 GPUs.
- cuda-toolkit-12-8: Installs CUDA toolkit version 12.8.
Note: NVIDIA does not maintain versioned branches for the container toolkit. To install a specific version:
$ sudo apt install nvidia-container-toolkit=VERSION
Example Installation Using 570 Drivers with CUDA 12.8
B200 and Newer GPUs
Install open 570 branch drivers with CUDA 12.8 and NVLink support.
Install driver, toolkit, and NVLink support.
$ sudo apt install --assume-yes nvidia-open-570 cuda-toolkit-12-8 nvlink5-570
Install the container runtime.
$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot to activate drivers.
$ sudo reboot
Check the GPU status.
$ nvidia-smi
H100 and Older GPUs
Install CUDA 570 driver branch with Fabric Manager and CUDA 12.8.
Install drivers, Fabric Manager, and toolkit.
$ sudo apt install --assume-yes cuda-drivers-fabricmanager-570 cuda-toolkit-12-8
Install the container runtime.
$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system.
$ sudo reboot
Verify that drivers are active.
$ nvidia-smi
Enable and start Fabric Manager.
$ sudo systemctl enable --now nvidia-fabricmanager
Check Fabric Manager status.
$ sudo systemctl status nvidia-fabricmanager
Output:
● nvidia-fabricmanager.service – NVIDIA fabric manager service
Loaded: loaded
Active: active (running)
Main PID: 4811 (nv-fabricmanage)
Conclusion
You have now successfully reinstalled or downgraded NVIDIA drivers on Bare Metal or Passthrough GPU environments and confirmed that the correct driver version is operational. For configurations requiring Fabric Manager, you ensured proper NVLink and NVSwitch functionality. With the drivers, CUDA toolkit, and container runtime in place, your system is ready for demanding GPU-based workloads and containerized GPU deployments.


