【AIGC魔童】DeepSeek v3推理部署：vLLM/SGLang/LMDeploy

- （1）使用vLLM推理部署DeepSeek
- （2）使用SGLang推理部署DeepSeek
- （3）使用LMDeploy推理部署DeepSeek

（1）使用vLLM推理部署DeepSeek

在這里插入圖片描述

GitHub地址：https://github.com/vllm-project/vllm

vLLMv0.6.6 支持在 NVIDIA 和 AMD GPU 上對 FP8 和 BF16 模式進行 DeepSeek-V3 推理。除了標準技術之外，vLLM 還提供管道并行性，允許您在通過網絡連接的多臺機器上運行此模型。

有關詳細指導，請參閱 vLLM 說明。也請隨時遵循增強計劃。

pip install vllm

DeepSeek R1調用

目前vLLM已支持DeepSeek R1模型調用，可以在模型支持列表中查看模型關鍵字：https://docs.vllm.ai/en/latest/models/supported_models.html

在這里插入圖片描述

使用python調用DeepSeek：

from vllm import LLMllm = LLM(model=deepseek-ai/DeepSeek-V3, task="generate")
output = llm.generate("Hello, my name is")
print(output)

（2）使用SGLang推理部署DeepSeek

在這里插入圖片描述

GitHub地址：https://github.com/sgl-project/sglang

SGLang 目前支持 MLA 優化、DP Attention、FP8 （W8A8）、FP8 KV Cache 和 Torch Compile，在開源框架中提供最先進的延遲和吞吐量性能。

值得注意的是，SGLang v0.4.1 完全支持在 NVIDIA 和 AMD GPU 上運行 DeepSeek-V3，使其成為一個高度通用且強大的解決方案。

SGLang 還支持多節點張量并行性，使您能夠在多臺聯網的計算機上運行此模型。

多Token預測（MTP）正在開發中，可以在優化計劃中跟蹤進度。

以下是 SGLang 團隊的啟動說明：https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3

安裝SGLang

pip install "sglang[all]>=0.4.1.post5" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer

調用DeepSeek R1

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust remote-code

此時服務將在30000端口啟動。接下來即可使用OpenAI風格API來調用DeepSeek R1模型了：

import openai
client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")# Chat completion
response = client.chat.completions.create(model="default",messages=[{"role": "system", "content": "You are a helpful AI assistant"},{"role": "user", "content": "List 3 countries and their capitals."},],temperature=0,max_tokens=64,)
print(response)

（3）使用LMDeploy推理部署DeepSeek

在這里插入圖片描述

GitHub地址：https://github.com/InternLM/lmdeploy

LMDeploy 是一個為大型語言模型量身定制的靈活、高性能的推理和服務框架，現在支持 DeepSeek-V3。它提供離線管道處理和在線部署功能，與基于 PyTorch 的工作流無縫集成。

有關使用 LMDeploy 運行 DeepSeek-V3 的全面分步說明，請參閱此處：InternLM/lmdeploy#2960

安裝LMDeploy

git clone -b support-dsv3 https://github.com/InternLM/lmdeploy.git
cd lmdeploy
pip install -e .

單任務推理，編寫Python腳本執行

from lmdeploy import pipeline, PytorchEngineConfigif __name__ == "__main__":pipe = pipeline("deepseek-ai/DeepSeek-R1-FP8",backend_config=PytorchEngineConfig(tp=8))messages_list = [[{"role": "user", "content": "Who are you?"}],[{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V3 adopts innovative architectures to guarantee economical training and efficient inference."}],[{"role": "user", "content": "Write a piece of quicksort code in C++."}],]output = pipe(messages_list)print(output)

在線服務調用

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-R1-FP8 --tp 8 --backend pytorch

接下來即可在23333端口調用DeepSeek R1模型：

from openai import OpenAI
client = OpenAI(api_key='YOUR_API_KEY',base_url="http://0.0.0.0:23333/v1")model_name = client.models.list().data[0].id
response = client.chat.completions.create(model=model_name,messages=[{"role": "user", "content": "Write a piece of quicksort code in C++."}],temperature=0.8,top_p=0.8)
print(response)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/69243.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/69243.shtml
英文地址，請注明出處：http://en.pswp.cn/web/69243.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！