chromadb使用hugging face模型時利用鏡像網站下載注意事項

chromadb默認使用sentence-transformers/all-MiniLM-L6-v2的詞嵌入（詞向量）模型，如果在程序首次運行時，collection的add或query操作時如果沒有指定embeddings或query_embeddings，程序會自動下載相關嵌入向量模型，但是由于默認hugging face后端網絡下載速度常常非常慢，所以需要指定鏡像網站以加快模型下載速度。

windows系統下具體操作步驟如下：

1、安裝huggingface_hub:

pip install huggingface_hub

2、設置huggingface后端鏡像網址系統變量：

set HF_ENDPOINT=https://hf-mirror.com

3、檢查系統變量是否設置成功：

hf env

4、x下載指定模型（如all-MiniLM-L6-v2模型）到本地指定文件夾中：

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir ./models/all-MiniLM-L6-v2 --resume-download --local-dir-use-symlinks False

5、在程序中使用本地模型（如all-MiniLM-L6-v2模型）示例：

from sentence_transformers import SentenceTransformer# 指定本地模型路徑（注意替換為實際路徑）
model_path = r".\models\all-MiniLM-L6-v2"  # Windows路徑建議用r""避免轉義問題
model = SentenceTransformer(model_path)  # 從本地加載模型# 輸入句子列表
sentences = ["This is an example sentence.", "Each sentence is converted."]
embeddings = model.encode(sentences)  # 生成384維向量# 打印結果（示例）
print("向量維度:", embeddings.shape)
for i, emb in enumerate(embeddings):print(f"句子 '{sentences[i]}' 的前5維向量: {emb[:5]}")

6、在chromadb中使用本地詞嵌入向量模型示例：

import chromadb
from sentence_transformers import SentenceTransformer# 指定本地模型路徑（注意替換為實際路徑）
model_path = r".\models\all-MiniLM-L6-v2"  # Windows路徑建議用r""避免轉義問題
model = SentenceTransformer(model_path)  # 從本地加載模型chroma_client = chromadb.Client()collection = chroma_client.create_collection(name="my_collection"
)#文本
documents=["This is a document about pineapple","This is an island of the USA","This is a location where there are many tourists","This is a document about oranges"]#文本通過模型轉換為向量
embeddings = model.encode(documents) #像集合中添加記錄
collection.add(embeddings=embeddings,ids=["id1", "id2","id3","id4"],documents=documents
)#查詢語句
query_texts=["This is a query document about hawaii"]
#查詢語句通過模型轉換為向量
query_embeddings = model.encode(query_texts)#查詢數據
results = collection.query(query_embeddings=query_embeddings,query_texts=query_texts, # Chroma will embed this for youn_results=2 # how many results to return
)print(results)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/94331.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/94331.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/94331.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！