使用 .NET Core 的本地 DeepSeek-R1

使用 .NET 在我的 MacBook Pro 上與當地 LLM 聊天的歷程。?

????????如今，只需使用瀏覽器即可輕松使用 ChatGPT 或其他 genAI。作為開發人員，我們可以通過直接集成 OpenAI API 等來做更復雜的事情。如果我們想在自己的機器上運行 LLM，只是為了找人聊天或開發一些有趣的東西，該怎么辦？?

????????DeepSeek最近發布的模型在軟件和技術行業引起了轟動。得益于蒸餾技術，更小、資源更便宜的模型現在可以在特定任務上發揮同樣強大的作用。無論是在 genAI 還是 ML 世界中，Python 都是占主導地位的堆棧。雖然我個人很了解 Python，但作為#EverythingInCSharp系列的傳統。在這篇文章中，我記錄了如何deepseek-ai/DeekSeek-R1-Distill-Llama-8B在 C# 程序中運行模型。

1、先決條件

Python 3 +
.NET8+
支持 lfs 的Git

????????由于 .NET 無法直接以格式使用 Hugging Face 模型.safetensors，因此您需要將.safetensors格式轉換為（GPT 生成的統一格式）或下載其他人.gguf轉換并上傳到Hugging Face 的格式（例如這個）。如果您不想自己轉換，請跳到第 5 點。

2.下載模型

使用以下 git 命令克隆模型存儲庫，請注意模型文件有 15GB 大：

git clone https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B

3. 克隆 Llama.cpp 倉庫

您將需要里面的轉換腳本將 HuggingFace 格式轉換為 GGUF 格式。

git clone https://github.com/ggerganov/llama.cpp.git

4.轉換模型

在成功運行轉換腳本之前，我必須：

4.1 設置虛擬環境

cd llama.cpp
python3 -m venv .
source bin/activate

4.2 安裝以下軟件包

python3 -m pip install numpy torch sentencepiece gguf safetensors transformers

python3 convert_hf_to_gguf.py ?--outfile your_filename.gguf ../DeepSeek-R1-Distill-Llama-8B

完成后您將看到以下內容：

5.創建一個新的C#控制臺程序

dotnet new console

6.安裝所需的軟件包

由于該模型是基于駱駝的，我們需要LlamaSharp

dotnet add package LLamaSharp

我正在使用 Macbook Pro，因此我也需要安裝一個特定的backends：

dotnet add package LLamaSharp.Backend.Cpu

如果您不使用 Mac，請參閱此處backends提供的官方列表。

7.創建聊天會話的代碼

我將從自述文件中復制示例代碼LlamaSharp來復制最小的聊天會話設置：

using LLama;
using LLama.Common;
using LLama.Sampling;

string modelPath = @"DeepSeek-R1-Distill-Llama-8B.gguf"; // change it to your own model path.

var parameters = new ModelParams(modelPath)
{
? ? ContextSize = 1024, // The longest length of chat as memory.
? ? GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);

// Add chat histories as prompt to tell AI how to act.
var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");

ChatSession session = new(executor, chatHistory);

InferenceParams inferenceParams = new InferenceParams()
{
? ? MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
? ? AntiPrompts = new List<string> { "User:" }, // Stop generation once antiprompts appear.

? ? SamplingPipeline = new DefaultSamplingPipeline(),
};

Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write("The chat session has started.\nUser: ");
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";

while (userInput != "exit")
{
? ? await foreach ( // Generate the response streamingly.
? ? ? ? var text
? ? ? ? in session.ChatAsync(
? ? ? ? ? ? new ChatHistory.Message(AuthorRole.User, userInput),
? ? ? ? ? ? inferenceParams))
? ? {
? ? ? ? Console.ForegroundColor = ConsoleColor.White;
? ? ? ? Console.Write(text);
? ? }
? ? Console.ForegroundColor = ConsoleColor.Green;
? ? userInput = Console.ReadLine() ?? "";
}

8.運行并嘗試

dotnet run

在?M3 Pro 12 核 CPU 的 Macbook 上，token 生成占用了 45% 的 CPU 時間，同時占用了大約 16GB 的內存。內存使用量與模型本身的大小基本相同。?

生成令牌時的 CPU 使用率

生成 token 時的內存使用情況

不生成 token 時的內存使用情況

盡情享受吧！🎉迫不及待地去看看你的 C# 代碼能用你本地的 LLM 做什么！?

如果您喜歡此文章，請收藏、點贊、評論，謝謝，祝您快樂每一天。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/73978.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/73978.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/73978.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！