本地快速部署谷歌開放模型Gemma教程(基于WasmEdge)

本地快速部署谷歌開放模型Gemma教程（基于WasmEdge）

一、介紹 Gemma
二、部署 Gemma
- 2.1 部署工具
- 2.1 部署步驟
三、構建超輕量級 AI 代理
四、總結

一、介紹 Gemma

在這里插入圖片描述
Gemma是一系列輕量級、最先進的開放式模型，采用與創建Gemini模型相同的研究和技術而構建。可以直接運行在本地的電腦上，無GPU也可以運行，只用CPU即可，只不過速度慢點。

二、部署 Gemma

2.1 部署工具

使用 Linux 環境 + WasmEdge 一個工具部署Gemma，WasmEdge 用來運行模型。

WasmEdge：https://github.com/wasmedge/wasmedge

🤩 WasmEdge 是在您自己的設備上運行 LLM 的最簡單、最快的方法。🤩

WasmEdge 是一個輕量級、高性能且可擴展的 WebAssembly 運行時。它是當今最快的 Wasm 虛擬機。WasmEdge 是CNCF主辦的官方沙箱項目。其用例包括現代 Web 應用程序架構（同構和 Jamstack 應用程序）、邊緣云上的微服務、無服務器 SaaS API、嵌入式功能、智能合約和智能設備。

在這里插入圖片描述

2.1 部署步驟

安裝具有 LLM 支持的 WasmEdge

可以從一行命令開始安裝 WasmEdge 運行時，并提供 LLM 支持。

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml

使用選項傳遞插件列表--plugins，安裝wasmedge_rustls和wasi_nn-ggml插件。wasmedge_rustls插件以啟用 TLS 和 HTTPS 網絡，為啟動API服務提供支持。wasi_nn-ggml使 WasmEdge 能夠在大型語言模型（例如LMMs的 gemma）上運行人工智能推理程序。

安裝完成后執行source /home/server/.bashrc，使wasmedge命令立即生效。

或者可以按照此處的安裝指南手動下載并復制 WasmEdge 安裝文件。

在 Wasm 中下載 LLM 聊天應用程序

接下來，獲取超小型 2MB 跨平臺二進制文件 - LLM 聊天應用程序，該應用程序允許您在命令行上與模型聊天。它證明了效率，不需要其他依賴項并提供跨各種環境的無縫操作，這個 2M 的小 Wasm 文件是從 Rust 編譯而來的。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

下載Gemma-7b-it 模型 GGUF 文件，由于模型大小為5.88G，下載可能需要一段時間。

curl -LO https://huggingface.co/second-state/Gemma-7b-it-GGUF/resolve/main/gemma-7b-it-Q5_0.gguf

模型下載匯總：https://github.com/LlamaEdge/LlamaEdge/blob/main/models.md

WasmEdge 還支持 Llama2、CodeLlama、Codeshell、Mistrial、MiscialLite、TinyLlama、Baichuan、BELLE、Alpaca、Vicuna、OpenChat、Starcoder、OpenBuddy 等等！

在 CLI 上與 Llama2 7b 模型聊天

現在您已完成所有設置，您可以開始使用命令行與 Llama2 7b 聊天支持的 LLM 聊天。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-7b-it-Q5_0.gguf llama-chat.wasm -p gemma-instruct -c 4096

便攜式 Wasm 應用程序會自動利用我設備上的硬件加速器（例如 GPU）。

[You]:
Create JSON for the following: There are 3 people, two males, One is named Mark. Another is named Joe. And a third person, who is a woman, is named Sam. The women is age 30 and the two men are both 19.[Bot]:
json
{"people": [{"name": "Mark","age": 19},{"name": "Joe","age": 19},{"name": "Sam","age": 30}]
}

在這里插入圖片描述

您可以使用同一llama-chat.wasm文件來運行其他 LLM，例如 OpenChat、CodeLlama、Mistral 等。

三、構建超輕量級 AI 代理

創建兼容OpenAI的API服務

當您使用領域知識或自托管 LLama2 模型微調模型時，僅使用 CLI 運行模型是不夠的。接下來，我們為開源模型設置兼容 OpenAI 的 API 服務，然后我們可以將微調后的模型集成到其他工作流程中。

假設您已經安裝了帶有 ggml 插件的 WasmEdge 并下載了您需要的模型。

首先，通過終端下載Wasm文件來構建API服務器，它也是一個跨平臺的便攜式 Wasm 應用程序，可以在許多 CPU 和 GPU 設備上運行。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

下載聊天機器人 Web UI，以通過聊天機器人 UI 與模型進行交互。

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

使用以下命令行啟動模型的 API 服務器。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-7b-it-Q5_0.gguf llama-api-server.wasm -p gemma-instruct -c 4096

然后，看到連接已建立后，打開瀏覽器訪問http://0.0.0.0:8080/即可使用可視化操作頁面聊天。

server@dev-fj-srv:~/code$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2b-it-Q5_0.gguf llama-api-server.wasm -p gemma-instruct   -c 4096
[2024-03-01 09:46:45.391] [error] instantiation failed: module name conflict, Code: 0x60
[2024-03-01 09:46:45.391] [error]     At AST node: module
[INFO] Socket address: 0.0.0.0:8080
[INFO] Model name: default
[INFO] Model alias: default
[INFO] Prompt context size: 4096
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 1
[INFO] Top-p sampling (1.0 = disabled): 1
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] Presence penalty (0.0 = disabled): 0
[INFO] Frequency penalty (0.0 = disabled): 0
[INFO] Prompt template: GemmaInstruct
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Starting server ...
[INFO] Plugin version: b2230 (commit 89febfed)
[INFO] Listening on http://0.0.0.0:8080

在這里插入圖片描述

您可以使用以下命令行來嘗試您的模型。

curl -X POST http://localhost:8080/v1/chat/completions \-H 'accept: application/json' \-H 'Content-Type: application/json' \-d '{"messages":[{"role":"system", "content": "You are a helpful assistant. Answer each question in one sentence."}, {"role":"user", "content": "Who is Robert Oppenheimer?"}], "model":"llama-2-chat"}'

四、總結

此教程用于基于 WasmEdge 系統的部署，可根據自身需求定制部署環境，靈活調整配置參數，滿足個性化需求。對模型和數據擁有完全控制權，可自由進行二次開發和擴展。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/713716.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/713716.shtml
英文地址，請注明出處：http://en.pswp.cn/news/713716.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！