這里寫目錄標題
- 驅動安裝
- 1. 更新系統
- 2. NVIDIA GPU安裝
- 檢查系統是否安裝了 NVIDIA GPU
- 2.1 首先,使用以下命令更新 DNF 軟件包存儲庫緩存:
- 2.2 安裝編譯 NVIDIA 內核模塊所需的依賴項和構建工具
- 2.3 在 CentOS Stream 9 上添加官方 NVIDIA CUDA 軟件包存儲庫
- 2.4 在 CentOS Stream 9 上安裝最新的 NVIDIA GPU 驅動程序
- 2.5 為了使更改生效,請使用以下命令重新啟動計算機:
- 2.6 測試
- 2、cuda-toolkit安裝
- 2.1 安裝
- 2.2 環境配置
- 測試
驅動安裝
參考:centos-stream-9-上安裝-nvidia-驅動程序
1. 更新系統
首先,確保你的系統是最新的:
sudo dnf update -y
2. NVIDIA GPU安裝
檢查系統是否安裝了 NVIDIA GPU
您可以使用以下命令檢查您的計算機是否安裝了 NVIDIA GPU:
lspci | egrep 'VGA|3D'
如您所見,我的計算機上安裝了 NVIDIA GeForce RTX 3060 GPU。您可能安裝了不同的 NVIDIA GPU。
[root@cheng ~]# lspci | egrep 'VGA|3D'
06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
默認情況下,CentOS Stream 9 上使用開源 Nouveau GPU 驅動程序1,而不是專有 NVIDIA GPU 驅動程序2。安裝專有 NVIDIA GPU 驅動程序后,您將看到它們被使用,而不是開源 Nouveau GPU 驅動程序。
lsmod | grep nouveau
lsmod | grep nvidia
[root@cheng ~]# lsmod | grep nouveau
lsmod | grep nvidia
nvidia_drm 143360 0
nvidia_modeset 1421312 1 nvidia_drm
nvidia_uvm 3899392 0
nvidia 70721536 2 nvidia_uvm,nvidia_modeset
video 77824 1 nvidia_modeset
drm_kms_helper 274432 2 nvidia_drm
drm 782336 4 drm_kms_helper,nvidia,nvidia_drm
[root@cheng ~]# lsmod | grep nvidia
lsmod | grep nouveau
nvidia_drm 143360 0
nvidia_modeset 1421312 1 nvidia_drm
nvidia_uvm 3899392 0
nvidia 70721536 2 nvidia_uvm,nvidia_modeset
video 77824 1 nvidia_modeset
drm_kms_helper 274432 2 nvidia_drm
drm 782336 4 drm_kms_helper,nvidia,nvidia_drm
從 BIOS 禁用安全啟動
要使 NVIDIA GPU 驅動程序在 CentOS Stream 9 上運行,如果主板使用 UEFI 固件啟動操作系統,則必須從主板的 BIOS 禁用安全啟動。
在 CentOS Stream 9 上啟用 EPEL 存儲庫
要在 CentOS Stream 9 上安裝 NVIDIA GPU 驅動程序,您必須安裝所需的構建工具和編譯 NVIDIA 內核模塊所需的依賴庫。其中一些可以在 CentOS Stream 9 EPEL 存儲庫中找到。
在本節中,我將向您展示如何在 CentOS Stream 9 上啟用 EPEL 存儲庫。
2.1 首先,使用以下命令更新 DNF 軟件包存儲庫緩存:
sudo dnf makecache
使用以下命令啟用官方 CentOS Stream 9 CRB 軟件包存儲庫:
sudo dnf config-manager --set-enabled crb
使用以下命令安裝 epel-release 和 epel-next-release 軟件包:
sudo dnf install epel-release epel-next-release
要確認安裝,請按Y,然后按。
要確認 GPG 密鑰,請按 Y,然后按 。
應安裝 epel-release 和 epel-next-release 軟件包,并啟用 EPEL 存儲庫。
為了使更改生效,請使用以下命令更新 DNF 軟件包存儲庫緩存:
sudo dnf makecache
2.2 安裝編譯 NVIDIA 內核模塊所需的依賴項和構建工具
要安裝編譯 NVIDIA 內核模塊所需的構建工具和依賴庫,請運行以下命令:
sudo dnf install kernel-headers-$(uname -r) kernel-devel-$(uname -r) tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms
要確認安裝,請按Y,然后按。
正在從互聯網下載所需的軟件包。需要一段時間才能完成。
下載軟件包后,系統會要求您確認 CentOS 官方軟件包存儲庫的 GPG 密鑰。
要確認 GPG 密鑰,請按 Y,然后按 。
要確認 EPEL 存儲庫的 GPG 密鑰,請按 Y,然后按 。
安裝應該繼續。
至此,編譯NVIDIA內核模塊所需的依賴庫和構建工具就應該安裝完畢了。
2.3 在 CentOS Stream 9 上添加官方 NVIDIA CUDA 軟件包存儲庫
要在 CentOS Stream 9 上添加官方 NVIDIA CUDA 軟件包存儲庫,請運行以下命令:
sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo
為了使更改生效,請使用以下命令更新 DNF 軟件包存儲庫緩存:
sudo dnf makecache
2.4 在 CentOS Stream 9 上安裝最新的 NVIDIA GPU 驅動程序
要在 CentOS Stream 9 上安裝最新版本的 NVIDIA GPU 驅動程序,請運行以下命令:
sudo dnf module install nvidia-driver:latest-dkms
要確認安裝,請按Y,然后按。
所有NVIDIA GPU驅動程序包和所需的依賴包都是從互聯網上下載的。需要一段時間才能完成。
下載軟件包后,系統會要求您確認官方 NVIDIA 軟件包存儲庫的 GPG 密鑰。按 Y,然后按 確認 GPG 密鑰。
安裝應該繼續。需要一段時間才能完成。
我在這步執行中報錯:
Last metadata expiration check: 0:05:51 ago on Fri 11 Apr 2025 03:30:46 PM CST.
Error: Problem 1: package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed- cannot install the best candidate for the job- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filteringProblem 2: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed- cannot install the best candidate for the job- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filteringProblem 3: package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glcore.so.570.124.06()(64bit), but none of the providers can be installed- package xorg-x11-nvidia-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-tls.so.570.124.06()(64bit), but none of the providers can be installed- package nvidia-xconfig-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires xorg-x11-nvidia(x86-64) >= 3:570.124.06, but none of the providers can be installed- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed- cannot install the best candidate for the job- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filteringProblem 4: package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver-libs(x86-64) = 3:570.124.06, but none of the providers can be installed- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-glvkspirv.so.570.124.06()(64bit), but none of the providers can be installed- package nvidia-driver-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires libnvidia-gpucomp.so.570.124.06()(64bit), but none of the providers can be installed- package nvidia-settings-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires nvidia-driver(x86-64) = 3:570.124.06, but none of the providers can be installed- package nvidia-driver-libs-3:570.124.06-1.el9.x86_64 from cuda-rhel9-x86_64 requires egl-wayland(x86-64) >= 1.1.13.1-3, but none of the providers can be installed- cannot install the best candidate for the job- package egl-wayland-1.1.13.1-3.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering- package egl-wayland-1.1.19~20250313gitf1fd514-1.el9.x86_64 from cuda-rhel9-x86_64 is filtered out by modular filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
問了大模型的解決辦法都不行,最后發現錯誤日志最后括號內的提示,改成如下命令就成功了:
sudo dnf module install nvidia-driver:latest-dkms --skip-broken
2.5 為了使更改生效,請使用以下命令重新啟動計算機:
sudo reboot
檢查 NVIDIA 驅動程序是否安裝正確
計算機啟動后,您應該會看到使用專有的 NVIDIA GPU 驅動程序1,而不是開源的 Nouveau GPU 驅動程序2。
lsmod | grep nvidia
lsmod | grep nouveau
您還應該在 CentOS Stream 9 的應用程序菜單中找到NVIDIA X Server Settings應用程序。單擊它。
NVIDIA X 服務器設置應用程序運行時應該沒有任何錯誤,并且應該顯示與您安裝的 NVIDIA GPU 相關的大量信息。
2.6 測試
您還應該能夠運行 NVIDIA 命令行程序,例如 nvidia-smi
。
[root@cheng ~]# nvidia-smiSun Dec 22 14:37:55 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:06:00.0 Off | N/A |
| 31% 23C P8 6W / 170W | 18MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2、cuda-toolkit安裝
2.1 安裝
參考官網:CUDA Toolkit 12.8 Update 1 Downloads
2.2 環境配置
全局配置,對所有用戶生效:
[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ sudo vim /etc/profile
在文件末尾追加:export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
然后,重啟終端 或 執行 source /etc/profile
測試
nvcc --version
[chenfeng@iZ2ze8ss1mj33afx13mulcZ temp]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0