使用 Ollama 和 Kibana 在本地為 RAG 測試 DeepSeek R1

作者：來自 Elastic?Dave Erickson?及?Jakob Reiter

每個人都在談論 DeepSeek R1，這是中國對沖基金 High-Flyer 的新大型語言模型。現在他們推出了一款功能強大、具有開放權重的思想鏈推理 LLM，這則新聞充滿了對行業意味著什么的猜測。對于那些想嘗試使用 RAG 和 Elasticsearch 的所有向量數據庫智能的新模型的人，這里有一個快速教程，幫助你開始使用 DeepSeek R1 進行本地推理。在此過程中，我們將使用 Elastic 的 Playground 功能，甚至發現 Deepseek R1 對 RAG 的一些優缺點。

這是我們將在本教程中配置的內容的圖表：

使用 Ollama 設置本地推理

Ollama 是一種快速測試精選的本地推理開源模型集的好方法，也是 AI 開發人員的熱門工具。

運行 Ollama 裸機

在 Mac、Linux 或 Windows 上本地安裝是利用你可能擁有的任何本地 GPU 功能的最簡單方法，尤其是對于那些擁有 M 系列 Apple 芯片的用戶。安裝 Ollama 后，你可以使用以下命令下載并運行 DeepSeek R1。

你可能需要將參數大小調整為適合你硬件的大小。可用大小可在此處找到。

ollama run deepseek-r1:7b

你可以在終端中與模型聊天，但當你按 CTL+d 退出命令或輸入 “/bye” 時，模型仍會繼續運行。要查看仍在運行的模型，請輸入：

ollama ps

在容器中運行 Ollama

或者，運行 Ollama 的最快方法是利用 Docker 等容器引擎。使用本地機器的 GPU 并不總是那么簡單，這取決于你的環境，但只要你的容器具有適合多 GB 模型的 RAM 和存儲空間，快速測試設置并不困難。

在 Docker 中啟動和運行 Ollama 非常簡單，只需執行：

mkdir ollama_deepseek
cd ollama_deepseek
mkdir ollama
docker run -d -v ./ollama:/root/.ollama -p 11434:11434 \
--name ollama ollama/ollama

這將在當前目錄中創建一個名為 “ollama” 的目錄并將其掛載到容器內，以存儲 Ollama 配置以及模型。根據使用的參數數量，它們可以從幾 GB 到幾十 GB 不等，因此請確保選擇具有足夠可用空間的卷。

注意：如果你的機器恰好有 Nvidia GPU，請確保安裝 Nvidia 容器工具包并在上面的 docker run 命令中添加 “--gpus=all”。

一旦 Ollama 容器在你的機器上啟動并運行，你就可以使用以下命令提取 deepseek-r1 之類的模型：

docker exec -it ollama ollama pull deepseek-r1:7b

與裸機方法類似，你可能需要將參數大小調整為適合你硬件的大小。可用大小可在 https://ollama.com/library/deepseek-r1 找到。

提取模型完成后，你可以輸入 “/bye” 退出提示。要驗證模型是否仍在運行：

docker exec -it ollama ollama ps

使用 curl 測試我們的本地推理

要使用 curl 測試本地推理，你可以運行以下命令。我們使用 stream:false 以便我們可以輕松讀取 JSON 敘述響應：

curl http://localhost:11434/api/generate -d '{"model": "deepseek-r1:7b","stream": false,"prompt":"Why is Elastic so cool?"
}'

測試 “與 OpenAI 兼容” 的 Ollama 和 RAG 提示

方便的是，Ollama 還提供一個 REST 端點，模仿 OpenAI 的行為，以便與包括 Kibana 在內的各種工具兼容。

curl http://localhost:11434/v1/chat/completions -d '{"model": "deepseek-r1:7b","stream": false,"messages": [{ "role": "system", "content": "You are a helpful AI Assistant that uses the following context to answer questions only use the following context. \n\nContext:  The color of the sky today is purple. "},{ "role": "user", "content": "What does the sky look like today?" }]
}'

測試這個更復雜的提示會產生一個內容，其中包含一個 <think> 部分，其中模型已經過訓練可以推理該問題。

<think> 
Okay, so I need to figure out what the user is asking for here. They provided a context where the sky is described as purple today and then asked about how the sky looks. At first glance, it seems straightforward—maybe they just want confirmation or more details on why the sky is that color.
Wait, but maybe there's something deeper. Purple skies aren't something I encounter every day. It usually happens at certain times of the year, like during sunrise or sunset with the sun setting in pink or orange. Could this be a hint about the time of day? Or perhaps it's just an unusual natural phenomenon? 
I should consider if \"purple\" is a typo. Maybe they meant something else like blue or gray. But since they specifically said purple, I'll go with that. Purple skies can happen when there are atmospheric conditions that scatter light differently, maybe due to pollution or cloud cover affecting the sunset.So, putting it all together, the user might be looking for an explanation of why today's sky is purple and what that implies about the weather or time of day. Alternatively, they could just want a simple statement confirming that the sky looks purple today.
</think>The color of the sky today is described as purple. This unusual shade can occur due to atmospheric conditions affecting light scattering, such as during sunrise/sunset with pollution or cloud cover influencing the sunset's hues.

將 Ollama 連接到 Kibana

使用 Elasticsearch 的一個好方法是 “start-local” 開發腳本。

確保你的 Kibana 和 Elastisearch 能夠在網絡上訪問你的 Ollama。如果你使用的是 Elastic 堆棧的本地容器設置，則可能意味著將“localhost”替換為“host.docker.internal”或“host.containers.internal”以獲取到主機的網絡路徑。

在 Kibana 中，導航到Stack Management > Alerts and Insights > Connectors。

如果你看到這是常見的設置警告，該怎么辦

你需要確保 xpack.encryptedSavedObjects.encryptionKey 設置正確。這是在運行 Kibana 的本地 docker 安裝時經常遺漏的步驟，因此我將列出在 Docker 語法中要修復的步驟。

確保持久保存 kibana/config 目錄，以便在容器關閉時保存更改。我的 Kibana 容器卷在 docker-compose.yml 中如下所示：

services:kibana:
...volumes:- certs:/usr/share/kibana/config/certs- kibanadata:/usr/share/kibana/data- kibanaconfig:/usr/share/kibana/config
...
volumes:certs:driver: localesdata01:driver: localkibanadata:driver: localkibanaconfig:driver: local

現在你可以創建密鑰庫并輸入一個值，以便連接器密鑰不以純文本形式存儲。

## generate some new keys for me and print them to the terminal
docker exec -it kibana_1 bin/kibana-encryption-keys generate## create a new keystrore
docker exec -it kibana_1 bin/kibana-keystore create
docker exec -it kibana_1 bin/kibana-keystore add xpack.encryptedSavedObjects.encryptionKey## You'll be prompted to paste in a value

完全重新啟動整個集群以確保更改生效。

創建連接器

從連接器配置屏幕（在 Kibana 中，Stack Management > Alerts and Insights?> Connectors），創建一個連接器并選擇 “OpenAI” 類型。

使用以下設置配置連接器

Connector name：Deepseek (Ollama)
選擇 OpenAI? provider：other (OpenAI Compatible Service)
URL：http://localhost:11434/v1/chat/completions
- 調整到你的 ollama 的正確路徑。如果你從容器內調用，請記住替換 host.docker.internal 或等效項
默認模型：deepseek-r1:7b
API 密鑰：編造一個，需要輸入，但值無關緊要

請注意，在連接器設置中測試到 Ollama 的自定義連接器目前在 8.17 中出現故障，但已在即將推出的 Kibana 8.18 版本中修復。

我們的連接器如下所示：

將嵌入向量的數據導入 Elasticsearch

如果你已經熟悉 Playground 并設置了數據，則可以跳至下面的 Playground 步驟，但如果你需要一些快速測試數據，我們需要確保設置了 _inference API。從 8.17 開始，機器學習分配是動態的，因此要下載并打開 e5 多語言密集向量，我們只需在 Kiban Dev 工具中運行以下命令即可。

GET /_inferencePOST /_inference/text_embedding/.multilingual-e5-small-elasticsearch
{"input": "are internet memes about deepseek sound investment advice?"
}

如果你還沒有這樣做，這將觸發從 Elastic 的模型存儲庫下載 e5 模型。

接下來，讓我們加載一本公共領域的書作為我們的 RAG 上下文。這是從 Project Gutenberg 下載 “愛麗絲夢游仙境” 的地方：鏈接。將其保存為 .txt 文件。

導航到 Elasticsearch > Home > Upload a file

選擇或拖放文本文件，然后點擊 “Import” 按鈕。

在 “Import data” 屏幕上，選擇 “Advanced” 選項卡，然后將索引名稱設置為 “book_alice”。

選擇 Add additional field” 選項，它位于 “Automatically created fields” 正下方。選擇“ Add semantic text field” 并將推理端點更改為 “.multilingual-e5-small-elasticsearch”。選擇 “Add”，然后選擇 “Import”。

當加載和推理完成后，我們就可以前往 Playground 了。

在 Playground 中測試 RAG

在 Kibana 中導航到 Elasticsearch > Playground。

在 Playground 屏幕上，你應該會看到一個綠色復選標記和 “LLM Connected”，表示連接器已存在。這是我們剛剛在上面創建的 Ollama 連接器。可以在此處找到 Playground 的詳細指南。

單擊藍色的 Add data sources，然后選擇我們之前創建的 book_alice 索引或你之前配置的其他索引，該索引使用推理 API 進行嵌入。

Deepseek 是一個具有強對齊特征的思維鏈模型。從 RAG 的角度來看，這既有好處也有壞處。思維鏈訓練可能有助于 Deepseek 合理化引文中看似矛盾的陳述，但與訓練知識的強一致性可能使其更喜歡自己的世界事實版本而不是我們的背景基礎。雖然意圖良好，但眾所周知，這種強一致性使得 LLM 在討論我們的私人知識收縮或未在訓練數據集中得到很好體現的主題時難以指導。

在我們的 Playground 設置中，我們輸入了以下系統提示 “You are an assistant for question-answering tasks using relevant text passages from the book Alice in wonderland - 你是使用《愛麗絲夢游仙境》一書中的相關文本段落進行問答任務的助手”，并接受其他默認設置。

對于 “Who was at the tea party? - 誰參加了茶話會？”這個問題，我們得到的答案是：“The March Hare, the Hatter, and the Dormouse were at the tea party. [Citation: position 1 and 2] - 答案：三月兔、帽匠和睡鼠參加了茶話會。[引用：位置 1 和 2]”，這是正確的。

我們可以從 <think> 標簽中看出，Deepseek 確實對引文的內容進行了深思熟慮，以回答問題。

測試對齊限制

讓我們為 Deepseek 創建一個智力挑戰場景作為測試。我們將創建一個陰謀論索引，Deepseek 的訓練數據知道這些陰謀論是不真實的。

在 Kibana 開發工具中，讓我們創建以下索引和數據：

PUT /classic_conspiracies
{"mappings": {"properties": {"content": {"type": "text","copy_to": "content_semantic"},"content_semantic": {"type": "semantic_text","inference_id": ".multilingual-e5-small-elasticsearch"}}}
}POST /classic_conspiracies/_doc/1
{"content": "birds aren't real, the government replaced them with drones a long time ago"
}
POST /classic_conspiracies/_doc/2
{"content": "tinfoil hats are necessary to prevent our brains from being read"
}
POST /classic_conspiracies/_doc/3
{"content": "ancient aliens influenced early human civilizations, this explains why things made out of stone are marginally similar on different continents"
}

這些陰謀論將成為我們 LLM 的基礎。盡管 Deepseek 提出了激進的系統提示，但它不會接受我們版本的事實。如果我們知道我們的私人數據更值得信賴、更有根據或更符合我們組織的需求，那么這種情況是不可接受的：

對于測試問題 “are birds real?”（解釋?know your meme），我們得到的答案是 ““In the provided context, birds are not considered real, but in reality, they are real animals. [Context: position 1] - 在提供的上下文中，鳥類不被認為是真實的，但實際上，它們是真實的動物。[上下文：位置 1]”。這個測試證明 DeepSeek R1 非常強大，即使在 7B 參數級別也是如此……然而，它可能不是 RAG 的最佳選擇，這取決于我們的數據集。