免費白嫖A100活動開始啦，InternLM + LlamaIndex RAG 實踐

內容來源：

Docs

前置知識：

檢索增強生成（Retrieval Augmented Generation，RAG）

LlamaIndex

LlamaIndex 是一個上下文增強的 LLM 框架，旨在通過將其與特定上下文數據集集成，增強大型語言模型（LLMs）的能力。

`xtuner`

書生集成的微調,測試大模型平臺

環境搭建：

創建環境

服務器已經預設好了conda環境，輸入書生服務器封裝bash代碼運行，激活環境，如下：

studio-conda -t llamaindex -o pytorch-2.1.2
conda activate llamaindex 
pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0cd ~
mkdir llamaindex_demo
mkdir model
cd ~/llamaindex_demo
touch download_hf.py
vim download_hf.py

下載RAG模型：

鍵入I，表示輸入，在download_hf.py輸入：

import os# 設置環境變量
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'# 下載模型
os.system('huggingface-cli download --resume-download sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --local-dir /root/model/sentence-transformer')

運行下載：

conda activate llamaindex
python download_hf.py

下載nltk資源：

cd /root
git clone https://gitee.com/yzy0612/nltk_data.git  --branch gh-pages
cd nltk_data
mv packages/*  ./
cd tokenizers
unzip punkt.zip
cd ../taggers
unzip averaged_perceptron_tagger.zip

LlamaIndex HuggingFaceLLM

cd ~/model
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/ ./
cd ~/llamaindex_demo
touch llamaindex_internlm.py
vim llamaindex_internlm.py
#點擊i 復制一下代碼
#from llama_index.llms.huggingface import HuggingFaceLLM
#from llama_index.core.llms import ChatMessage
#llm = HuggingFaceLLM(
#    model_name="/root/model/internlm2-chat-1_8b",
#    tokenizer_name="/root/model/internlm2-chat-1_8b",
#    model_kwargs={"trust_remote_code":True},
#    tokenizer_kwargs={"trust_remote_code":True}
#)#rsp = llm.chat(messages=[ChatMessage(content="xtuner是什么？")])
#print(rsp)#輸入完成
#點擊ESC
#：wq
#保存conda activate llamaindex
cd ~/llamaindex_demo/
python llamaindex_internlm.py

這里在測試使用大模型是否正常輸出,并且跟后續加入RAG后效果對比

conda activate llamaindex
pip install llama-index-embeddings-huggingface llama-index-embeddings-instructor
cd ~/llamaindex_demo
mkdir data
cd data
git clone https://github.com/InternLM/xtuner.git
mv xtuner/README_zh-CN.md ./
cd ~/llamaindex_demo
touch llamaindex_RAG.py
#寫入下面python代碼
conda activate llamaindex
cd ~/llamaindex_demo/
python llamaindex_RAG.py

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settingsfrom llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLMembed_model = HuggingFaceEmbedding(model_name="/root/model/sentence-transformer"
)Settings.embed_model = embed_modelllm = HuggingFaceLLM(model_name="/root/model/internlm2-chat-1_8b",tokenizer_name="/root/model/internlm2-chat-1_8b",model_kwargs={"trust_remote_code":True},tokenizer_kwargs={"trust_remote_code":True}
)
Settings.llm = llmdocuments = SimpleDirectoryReader("/root/llamaindex_demo/data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("xtuner是什么?")print(response)

關卡任務

完成以下任務，并將實現過程記錄截圖：

通過 llamaindex 運行 InternLM2 1.8B，詢問“你是誰”，將運行結果截圖。
通過 llamaindex 實現知識庫檢索，詢問兩個問題將運行結果截圖。
- 問題1：xtuner是什么?
- 問題2：xtuner支持那些模型？

完成作業10%RAG是不夠的，但是都到這里了直接向助教申請30%資源：

/root/.conda/envs/llamaindex/bin/python /root/data/data/llamaindex_RAG.py
/root/.conda/envs/llamaindex/lib/python3.10/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_id" has conflict with protected namespace "model_".You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.warnings.warn(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:39<00:00, 19.95s/it]
Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):File "/root/data/data/llamaindex_RAG.py", line 23, in <module>response = query_engine.query("xtuner是什么?")File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py", line 52, in queryquery_result = self._query(str_or_query_bundle)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 190, in _queryresponse = self._response_synthesizer.synthesize(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/response_synthesizers/base.py", line 241, in synthesizeresponse_str = self.get_response(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py", line 43, in get_responsereturn super().get_response(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py", line 183, in get_responseresponse = self._give_response_single(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py", line 238, in _give_response_singleprogram(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py", line 84, in __call__answer = self._llm.predict(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/llms/llm.py", line 438, in predictresponse = self.complete(formatted_prompt, formatted=True)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapperresult = func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/core/llms/callbacks.py", line 429, in wrapped_llm_predictf_return_val = f(_self, *args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/llama_index/llms/huggingface/base.py", line 358, in completetokens = self._model.generate(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generateresult = self._sample(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sampleoutputs = self(File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forwardoutput = module._old_forward(*args, **kwargs)File "/root/.cache/huggingface/modules/transformers_modules/internlm2-chat-1_8b/modeling_internlm2.py", line 1060, in forwardlogits = self.output(hidden_states)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_implreturn forward_call(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forwardoutput = module._old_forward(*args, **kwargs)File "/root/.conda/envs/llamaindex/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forwardreturn F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 316.00 MiB. GPU 0 has a total capacty of 7.99 GiB of which 198.00 MiB is free. Process 1736952 has 35.84 GiB memory in use. Process 364582 has 7.80 GiB memory in use. Of the allocated memory 7.25 GiB is allocated by PyTorch, and 63.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

有RAG和沒RAG結果對比：

任務截圖

將任務寫入py

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.llms import ChatMessage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLMembed_model = HuggingFaceEmbedding(model_name="/root/model/sentence-transformer"
)Settings.embed_model = embed_modelllm = HuggingFaceLLM(model_name="/root/model/internlm2-chat-1_8b",tokenizer_name="/root/model/internlm2-chat-1_8b",model_kwargs={"trust_remote_code":True},tokenizer_kwargs={"trust_remote_code":True}
)
Settings.llm = llm
rsp = llm.chat(messages=[ChatMessage(content="你是誰？")])
print("你是誰？",rsp)
documents = SimpleDirectoryReader("/root/data/data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("xtuner是什么?")
print("xtuner是什么?",response)
response = query_engine.query("xtuner支持哪些模型")
print("xtuner支持哪些模型",response)

你是誰？