查看已安裝的驅動
[root@localhost:~] esxcli software vib list
Name Version Vendor Acceptance Level Install Date Platforms
----------------------------- ------------------------------------ ------ ---------------- ------------ ---------
NVD-VMware_ESXi_8.0.0_Driver 525.147.01-1OEM.800.1.0.20613240 NVD VMwareAccepted 2025-07-29 host
nvdgpumgmtdaemon 525.147.01-1OEM.700.1.0.15843807 NVD VMwareAccepted 2025-07-29 host
atlantic 1.0.3.0-13vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
bcm-mpi3 8.8.1.0.0.0-1vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
bnxtnet 226.0.21.0-31vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
卸載Grid驅動
[root@localhost:~] esxcli software vib remove --vibname=nvdgpumgmtdaemon
Removal ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.VIBs Installed:VIBs Removed: NVD_bootbank_nvdgpumgmtdaemon_525.147.01-1OEM.700.1.0.15843807VIBs Skipped:Reboot Required: trueDPU Results:
[root@localhost:~]
[root@localhost:~] esxcli software vib remove --vibname=NVD-VMware_ESXi_8.0.0_Driver
Removal ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.VIBs Installed:VIBs Removed: NVD_bootbank_NVD-VMware_ESXi_8.0.0_Driver_525.147.01-1OEM.800.1.0.20613240VIBs Skipped:Reboot Required: trueDPU Results:
[root@localhost:~]
安裝Grid驅動
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers] esxcli software component apply -d /vmfs/volumes/datastore1/Drivers/NVD-VGPU-800_535.230.02-1OEM.800.1.0.20613
240_24481118.zip
Installation ResultMessage: Operation finished successfully.Components Installed: NVD-VGPU-800_535.230.02-1OEM.800.1.0.20613240Components Removed:Components Skipped:Reboot Required: falseDPU Results:
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers]
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers] esxcli software component apply -d /vmfs/volumes/datastore1/Drivers/nvd-gpu-mgmt-daemon_535.230.02-0.0.0000_24
467933.zip
Installation ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.Components Installed: nvd-gpu-mgmt-daemon_535.230.02-0.0.0000Components Removed:Components Skipped:Reboot Required: trueDPU Results:
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers]
驗證驅動
[root@localhost:~] nvidia-smi
Mon Aug 18 15:56:21 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: N/A |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P4 On | 00000000:18:00.0 Off | Off |
| N/A 36C P8 10W / 75W | 32MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A2 On | 00000000:3B:00.0 Off | 0 |
| 0% 54C P8 9W / 60W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A2 On | 00000000:86:00.0 Off | 0 |
| 0% 51C P8 8W / 60W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------++---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
[root@localhost:~]
有幾個大大的疑問
1、單插一張A2卡,nvidia-smi驅動識別不到A2設備
2、插兩張A2卡,nvidia-smi驅動只識別到一個A2設備
3、PCIe卡槽順序先插一張P4卡再插一張A2卡,nvidia-smi驅動識別到全部設備
4、PCIe卡槽順序先插一張P4卡再插兩張A2卡,nvidia-smi驅動識別到全部設備
5、PCIe卡槽順序插卡,只識別到插在P4卡槽后的A2卡
聽說是NVIDIA Ampere(2021.4)架構的問題,是主板(Cascade Lake 2019.4)太老了或是BIOS版本太老了