前言
安裝Docker請看之前博文:Docker實戰中1panel方式
安裝Docker。
安裝 NVIDIA 容器工具包
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
安裝
先決條件
- 閱讀有關平臺支持的部分。
- 為您的 Linux 發行版安裝 NVIDIA GPU 驅動程序。NVIDIA 建議使用發行版對應的軟件包管理器安裝驅動程序。有關使用軟件包管理器安裝驅動程序的信息,請參閱NVIDIA 驅動程序安裝快速入門指南。或者,您也可以通過下載安裝程序來安裝驅動程序
.run
。
在使用 cgroup 驅動程序的系統上存在一個已知問題,systemd
該問題會導致容器在運行時失去對所請求 GPU 的訪問權限 。請參閱故障排除文檔了解更多信息。systemctl daemon reload
使用apt
:Ubuntu,Debian(不可用)
- 配置生產存儲庫:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
(可選)配置存儲庫以使用實驗包:
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
- 從存儲庫更新軟件包列表:
sudo apt-get update
- 安裝 NVIDIA Container Toolkit 軟件包:
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1sudo apt-get install -y \nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
很不幸,結果報錯:
錯誤:7 [https://nvidia.github.io/libnvidia-container/stable/deb/amd64](https://nvidia.github.io/libnvidia-container/stable/deb/amd64) InRelease Could not handshake: Error in the pull function. [IP: 185.199.110.153 443]
錯誤:8 [https://nvidia.github.io/libnvidia-container/experimental/deb/amd64](https://nvidia.github.io/libnvidia-container/experimental/deb/amd64) InRelease Could not handshake: Error in the pull function. [IP: 185.199.110.153 443] 正在讀取軟件包列表... 完成
W: 無法下載 [https://nvidia.github.io/libnvidia-container/stable/deb/amd64/InRelease](https://nvidia.github.io/libnvidia-container/stable/deb/amd64/InRelease) Could not handshake: Error in the pull function. [IP: 185.199.110.153 443]
W: 無法下載 [https://nvidia.github.io/libnvidia-container/experimental/deb/amd64/InRelease](https://nvidia.github.io/libnvidia-container/experimental/deb/amd64/InRelease) Could not handshake: Error in the pull function. [IP: 185.199.110.153 443]
W: 部分索引文件下載失敗。如果忽略它們,那將轉而使用舊的索引文件。
下載deb后安裝(可用)
- 首先查看上面安裝命令里面的這個鏈接
https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list
deb https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) /
-
點進去https://nvidia.github.io/libnvidia-container/stable/deb
頁面顯示Unsupported distribution or misconfigured repository settings -
在這個頁面找到GitHub Pages repository structure
stable/deb
: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/debexperimental/deb
: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/experimental/debstable/rpm
: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/rpmexperimental/rpm
: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/experimental/rpm
-
好嘛,找到老巢了
https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64 -
下載下面幾個文件,安裝即可
sudo dpkg -i nvidia-container-toolkit_1.17.7-1_amd64.deb nvidia-container-toolkit-base_1.17.7-1_amd64.deb libnvidia-container1_1.17.7-1_amd64.deb libnvidia-container-tools_1.17.7-1_amd64.deb
配置
先決條件
- 您安裝了受支持的容器引擎(Docker、Containerd、CRI-O、Podman)。
- 您安裝了 NVIDIA Container Toolkit。
配置Docker
- 使用以下
nvidia-ctk
命令配置容器運行時:
sudo nvidia-ctk runtime configure --runtime=docker
該nvidia-ctk
命令會修改/etc/docker/daemon.json
主機上的文件。該文件已更新,以便 Docker 可以使用 NVIDIA 容器運行時。
{"runtimes": {"nvidia": {"args": [],"path": "nvidia-container-runtime"}}
}
- 重新啟動 Docker 守護進程:
sudo systemctl restart docker
配置 containerd(用于 Kubernetes)
- 使用以下
nvidia-ctk
命令配置容器運行時:
sudo nvidia-ctk runtime configure --runtime=containerd
該nvidia-ctk
命令會修改/etc/containerd/config.toml
主機上的文件。該文件已更新,以便 containerd 可以使用 NVIDIA 容器運行時。
- 重啟containerd:
sudo systemctl restart containerd
運行示例工作負載
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
安裝和配置工具包并安裝 NVIDIA GPU 驅動程序后,您可以通過運行示例工作負載來驗證您的安裝。
- 運行示例 CUDA 容器:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
您的輸出應類似于以下輸出:
Thu Aug 14 13:54:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03 Driver Version: 575.64.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off |
| 31% 37C P8 34W / 450W | 1280MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
1panel安裝docker版ollama
從應用商店安裝
- 端口外部訪問勾選上
- GPU配置如下:
reservations:devices:- capabilities:- gpucount: alldriver: nvidia
- 模型掛載換了下路徑:
volumes:- /home/d/.ollama:/root/.ollama
GPU監控
很是方便,媽媽再也不用擔心忘記敲:nvidia-smi
模型
之前寫過一篇直接安裝ollama的博文:使用ollama部署本地大模型(沒有GPU也可以),實現IDEA和VS Code的git commit自動生成,其中模型位置:/usr/share/ollama/.ollama/models
在docker掛載中改為了/home/d/.ollama/models,點擊從服務器同步即可