llama.cpp部署 DeepSeek-R1 模型

一、llama.cpp 介紹

使用純 C/C++推理 Meta 的LLaMA模型（及其他模型）。主要目標llama.cpp是在各種硬件（本地和云端）上以最少的設置和最先進的性能實現 LLM 推理。純 C/C++ 實現，無任何依賴項Apple 芯片是一流的——通過 ARM NEON、Accelerate 和 Metal 框架進行了優化AVX、AVX2、AVX512 和 AMX 支持 x86 架構1.5 位、2 位、3 位、4 位、5 位、6 位和 8 位整數量化，可加快推理速度并減少內存使用用于在 NVIDIA GPU 上運行 LLM 的自定義 CUDA 內核（通過 HIP 支持 AMD GPU，通過 MUSA 支持 Moore Threads MTT GPU）Vulkan 和 SYCL 后端支持CPU+GPU 混合推理，部分加速大于 VRAM 總容量的模型。

Github 地址：https://github.com/ggerganov/llama.cpp
下載地址：https://github.com/ggerganov/llama.cpp/releases

二、 llama.cpp安裝

llama.cpp：基于C++重寫了 LLaMa 的推理代碼，是一種推理框架。支持動態批處理，支持混合推理。
llama.cpp：只支持 gguf 格式的模型，可以自己生成或從 huggingface 等平臺下載 gguf 格式的模型；
在這里插入圖片描述

2.1、llama.cpp：純 CPU 運行，并且支持 avx512 指令集，

地址：https://github.com/ggerganov/llama.cpp/releases/download/b4658/llama-b4658-bin-win-avx512-x64.zip，
#運行參數配置：https://github.com/ggerganov/llama.cpp/tree/master/examples/server
#下載完成后，解壓到 D:\llama-b4658-bin-win-avx512-x64 目錄

linux

##llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/
make##下載模型并轉換
conda create -n llamacpp python=3.12
conda activate llamacpp
pip install -r requirements.txt###下載模型到 models/ 目錄下
cd models
sudo apt-get install git-lfs
# or
git lfs install
git clone https://www.modelscope.cn/qwen/Qwen2-0.5B-Instruct.git
./llama-cli?-m?models/Qwen2-0.5B-Instruct/Qwen2-0.5B-Instruct-F
16.gguf?-p?hello?-n?256
$推理測試

2.2、DeepSeek-R1 模型

下載地址：https://hf-mirror.com/lmstudio-community/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/tree/main，本文以#“DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf”為例。

在這里插入圖片描述

2.3llama.cpp 部署 DeepSeek-R1 模型

在 DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf 文件目錄下面執行如下命令：
chcp 65001set PATH=D:\llama-b4658-bin-win-avx512-x64;%PATH%llama-server -m DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf --port 8080

在這里插入圖片描述

使用瀏覽器打開 http://127.0.0.1:8080/ 地址進行測試，

curl --request POST \--url http://localhost:8080/completion \--header "Content-Type: application/json" \--data '{"prompt": "

Building a website can be done in 10 simple steps:",“n_predict”: 128}’

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/70169.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/70169.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/70169.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！