寫在前方:ubuntu系統,顯卡重啟后驅動失效,顯卡不可用。網上沖浪之后得以有效解決,以下是解決方案
- 查看顯卡nvidia-smi;驅動失效消息:
(base) root@node:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
-
驅動失效原因:
系統內核升級,與原驅動信息不匹配 -
解決辦法:
不建議重新安裝驅動,可通過DKMS(Dynamic Kernel Module Support)
修復,它能夠維護內核外的驅動程序,并且在內核版本變化后自動生成新的模塊。
1、下載dkms,apt-get install dkms
:
(base) root@node:~# apt-get install dkms
2、查看驅動版本信息ls /usr/src |grep nvidia
:
(base) root@node:~# ls /usr/src |grep nvidia
nvidia-550.90.07
3、使用dkms修復:
(base) root@node:~# dkms install -m nvidia -v 550.90.07
4、檢查驅動是否可用:nvidia-smi
(base) root@node:~# nvidia-smi
Fri Jul 12 06:00:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A800 80GB PCIe Off | 00000000:4B:00.0 Off | 0 |
| N/A 41C P0 68W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A800 80GB PCIe Off | 00000000:65:00.0 Off | 0 |
| N/A 43C P0 68W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A800 80GB PCIe Off | 00000000:B1:00.0 Off | 0 |
| N/A 42C P0 71W / 300W | 1MiB / 81920MiB | 3% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A800 80GB PCIe Off | 00000000:E3:00.0 Off | 0 |
| N/A 48C P0 74W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
參考資料:
https://blog.csdn.net/trainingVIP/article/details/137789875