AI系列：大語言模型的RAG（檢索增強生成）技術（下）-- 使用LlamaIndex

前言

繼上一篇文章AI系列：大語言模型的RAG（檢索增強生成）技術（上），這篇文章主要以LlamaIndex為案例來實現RAG技術。如對背景知識感興趣，請移步大語言模型的RAG（檢索增強生成）技術（上）。

什么是LlamaIndex?

從LlamaIndex官網，可以找到如下的介紹：

LlamaIndex is a framework for building context-augmented LLM applications.
LlamaIndex provides tooling to enable context augmentation. A popular example is Retrieval-Augmented Generation (RAG) which combines context with LLMs at inference time. Another is
finetuning.
翻譯成中文：
如LlamaIndex 是一個用于構建上下文增強型大型語言模型（LLM）應用的框架。
LlamaIndex 提供了工具來實現上下文增強。一個流行的例子是檢索增強生成（RAG），它在推理時將上下文與大型語言模型結合起來。另一個例子是微調（finetuning）。

LlamaIndex為實現RAG技術提供了很多工具，詳細信息可以參考官網。這里列出了一種實現方式，跟下方的代碼示例相匹配，圖示如下：

LlamaIndex代碼

本部分代碼參考了LlamaIndex官網的RAG Starter Tutorial (OpenAI) 和Starter Tutorial (Local Models)等文檔。

RAG Starter Tutorial (OpenAI) 中提到了一個使用OpenAI服務的例子，只需5行代碼即可實現RAG。
如果使用OpenAI服務，則可以跳過下方embedding模型和LLM模型的設置，配好OPENAI_API_KEY環境變量即可。

設置embedding模型

因為沒有OpenAI的token，采用HuggingFace服務器上的北京智源的bge-small-en-v1.5作為嵌入模型。

from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

設置LLM模型

對于大型語言模型（LLM），我使用的是本地運行的Ollama服務器上的gemma:2b模型。由于我的個人筆記本配置較低（M1芯片，8G內存），我只能運行參數最低的模型。盡管如此，這并不影響我們演示RAG流程的基本原理：

from llama_index.llms.ollama import Ollama
gemma_2b = Ollama(model="gemma:2b", request_timeout=30.0)
Settings.llm = gemma_2b

LlamaIndex官網也提供了使用Hugging Face模型（本地及遠程）及其他類型模型的代碼示例，參見Hugging face LLMs。

索引

這部分代碼參考了官網RAG Starter Tutorial (OpenAI) 中的例子。不同的是，我使用的是本地硬盤上的一篇介紹llama2的pdf文檔，之后我會做關于llama2的提問。

這里包括嵌入向量索引的創建和持久化。如果去掉持久化這個非必需的部分，其實只需要兩行代碼。

PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):documents = SimpleDirectoryReader("./articles").load_data()# store docs into vector DB 將文檔切塊，計算嵌入向量，并索引index = VectorStoreIndex.from_documents(documents)# store it for later 持久化數據到本地磁盤index.storage_context.persist(persist_dir=PERSIST_DIR)
else:# load the existing index 直接讀取本地磁盤數據到索引中storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)index = load_index_from_storage(storage_context)

查詢機

LlamaIndex提供了query engine，它可以通過retriever檢索到index索引中語義相近的文檔，與初始問題合并提供給大語言模型。
這里也只需要兩行代碼。

# 
query_engine = index.as_query_engine()
response = query_engine.query("Is there Llama 2 70B?")
print(response)

如果你想獲得更大的靈活性，也可以顯示的定義retriever檢索器。

驗證

提問：

Is there Llama 2 70B?

gemma:2b大模型實在是個玩具，只能提問這樣簡單的問題。復雜點的問題它回答的亂七八糟。

執行程序，gemma:2b基于檢索獲得的增強上下文，回答正確：

Yes, Llama2 70B is mentioned in the context information. It is a large language model that outperforms all open-source models.

如果不使用上述程序，而直接提問它相同的問題，得到的答案則是無法回答相關問題：

I am unable to access external sources or provide real-time information, therefore I cannot answer this question.

使用感受

LlamaIndex使得RAG的實現變得簡單。它的結構看起來非常簡潔和優雅。
但是實際生產中可能涉及到的細節則很多，比如切塊的粒度，檢索的各種特性，提示語的自定義，等等。很多在Llama index是支持的，但使用效果有待驗證。
LlamaIndex的設計理念及其發展值得持續關注。

參考資料

什么是LlamaIndex?
-https://docs.llamaindex.ai/en/stable/
LlamaIndex RAG Starter Tutorial (OpenAI) https://docs.llamaindex.ai/en/stable/getting_started/starter_example/
LlamaIndex RAG Starter Tutorial (Local Models) https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/
LlamaIndex query engine
https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/
LlmaIndex [Hugging face LLMs]
https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/12010.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/12010.shtml
英文地址，請注明出處：http://en.pswp.cn/web/12010.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！