1 更新系統環境
參考vllm官網文檔,vllm對apple m1平臺mac os, xcoder, clang有如下要求
OS:?
macOS Sonoma
?or laterSDK:?
XCode 15.4
?or later with Command Line ToolsCompiler:?
Apple Clang >= 15.0.0
在App Store更新macOS和XCoder,依據XCoder版本號安裝command line tools。
https://developer.apple.com/download/all/?q=Command%20Line%20Tools
?2 安裝anconda并初始化虛擬環境
下載并安裝apple m1版本anconda
比如 Anaconda3-2025.06-0-MacOSX-arm64.pkg
https://www.anaconda.com/download-success
初始化conda虛擬環境
conda create -n vllm python=3.12
conda activate vllm
3 安裝vllm
1)下載vllm
git clone?https://github.com/vllm-project/vllm.git
大部分情況git clone會失敗,所以直接下載vllm的release版本,這里下載0.92,鏈接如下。
https://github.com/vllm-project/vllm/releases/download/v0.9.2/vllm-0.9.2.tar.gz
https://github.com/vllm-project/vllm/releases
2)安裝vllm
先安裝依賴
cd vllm
pip install -r requirements/cpu.txt
conda install cmake
conda install ninja
再安裝vllm
pip install -e .
需注意的是以上操作要在mac自帶終端下完成,在iterm下會遇到編譯問題。
4 驗證vllm
vLLM 將 Hugging Face 模型下載到本地,默認?~/cache/huggingface/hub
?文件夾中。
以下是測試代碼。
import osos.environ["HF_ENDPOINT"] = "https://hf-mirror.com"from vllm.entrypoints.llm import LLM
from vllm.sampling_params import SamplingParamsmodel_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"llm = LLM(model=model_name, max_model_len=128)sampling_params = SamplingParams(temperature = 0.9,max_tokens = 100)
prompt = "中國首都在那?"
output = llm.generate(prompt, sampling_params)print(output)
print(output[0].outputs[0].text)
另外,vLLM 還可以作為服務運行。
目前存在的問題是運行慢,后續看有效量化方法,比如llama.cpp的int4量化,鏈接如下。
https://blog.csdn.net/liliang199/article/details/149246699
reference
---
vllm
https://github.com/vllm-project/vllm.git
vllm?CPU install doc
https://docs.vllm.ai/en/latest/getting_started/installation/cpu.html
mac command line tools
https://developer.apple.com/download/all/?q=Command%20Line%20Tools
小白入門:使用vLLM在本機MAC上部署大模型
https://www.53ai.com/news/OpenSourceLLM/2025040116542.html
hf-mirror
https://hf-mirror.com/