開源 GPU 集群管理器 GPUStack 輕松拉起deepseek各版本模型

GPUStack 是一個用于運行 AI 模型的開源 GPU 集群管理器。
項目地址：gpustack/gpustack: Manage GPU clusters for running AI modelshttps://github.com/gpustack/gpustackhttps://github.com/gpustack/gpustackhttps://github.com/gpustack/gpustackhttps://github.com/gpustack/gpustack

核心特性

廣泛的硬件兼容性：支持管理 Apple Mac、Windows PC 和 Linux 服務器上不同品牌的 GPU。
廣泛的模型支持：從大語言模型 LLM、多模態模型 VLM 到 Diffusion 擴散模型、STT 與 TTS 語音模型、文本嵌入和重排序模型的廣泛支持。
異構 GPU 支持與擴展：輕松添加異構 GPU 資源，按需擴展算力規模。
分布式推理：支持單機多卡并行和多機多卡并行推理。
多推理后端支持：支持 llama-box（基于 llama.cpp 和 stable-diffusion.cpp）、vox-box 和 vLLM 作為推理后端。
輕量級 Python 包：最小的依賴和操作開銷。
OpenAI 兼容 API：提供兼容 OpenAI 標準的 API 服務。
用戶和 API 密鑰管理：簡化用戶和 API 密鑰的管理流程。
GPU 指標監控：實時監控 GPU 性能和利用率。
Token 使用和速率統計：有效跟蹤 token 使用情況，并管理速率限制。

安裝

Linux 或 macOS

GPUStack 提供了安裝腳本，可以將其安裝為 Linux 的 systemd 服務或 macOS 的 launchd 服務，默認端口為 80。要使用此方法安裝 GPUStack，執行以下命令：

curl -sfL https://get.gpustack.ai | INSTALL_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple sh -s -

Windows

以管理員身份運行 PowerShell（避免使用 PowerShell ISE），然后執行以下命令安裝 GPUStack：

$env:INSTALL_INDEX_URL = "https://pypi.tuna.tsinghua.edu.cn/simple"
Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

其他安裝方式

有關手動安裝、Docker 安裝或詳細配置選項，請參考安裝文檔https://docs.gpustack.ai/latest/installation/installation-script/https://docs.gpustack.ai/latest/installation/installation-script/https://docs.gpustack.ai/latest/installation/installation-script/https://docs.gpustack.ai/latest/installation/installation-script/

本次實驗選擇linux安裝

curl -sfL https://get.gpustack.ai | INSTALL_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple sh -s -  --port 9090

等待中...

安裝完成

相關端口與進程都啟動成功

訪問GPUStack

在瀏覽器中打開?http://myserver，訪問 GPUStack 界面。
訪問地址：?http://localhost:9090

使用“admin”用戶名和默認密碼登錄 GPUStack。

獲取默認密碼

Linux or macOS

cat /var/lib/gpustack/initial_admin_password

Windows

Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -Raw

部署模型

模型分類根據自己想要的模型進行部署

選擇好模型點保存

之后模型就會開始下載? （running既是代表可用）

模型資源占用情況

測試并發可以四個問題同時回答

納管多個GPU work節點

主節點獲取token? ?cat /var/lib/gpustack/token

(base) root@DESKTOP-TUR5ISE:~# cat /var/lib/gpustack/token
8f297e35a55fa652837188acedfd8323

注冊 Worker?(注意：mytoken?為第一步獲取到的 Token)

?

Linux 或 MacOS

curl -sfL https://get.gpustack.ai | sh -s - --server-url http://localhost:9090 --token ${mytoken}

Windows

Invoke-Expression "& { $((Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content) } --server-url http://localhost:9090 --token ${mytoken}"

加入一臺同事的M2 Pro?芯片 mac電腦測試
work節點運行
?

pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simplecurl -sfL https://get.gpustack.ai | sh -s - --server-url http://10.176.20.121:9090 --token 8f297e35a55fa652837188acedfd8323

可以看到新增work

新增GPU

手動調度GPU運行模型

之后重新部署后生效

dify 添加?GPUStack API
?

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/68151.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/68151.shtml
英文地址，請注明出處：http://en.pswp.cn/web/68151.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！