LangChain整合Milvus向量數據庫實戰：數據新增與刪除操作

導讀：在AI應用開發中，向量數據庫已成為處理大規模語義搜索和相似性匹配的核心組件。本文通過詳實的代碼示例，深入探討LangChain框架與Milvus向量數據庫的集成實踐，為開發者提供生產級別的向量數據管理解決方案。

文章聚焦于向量數據庫操作的兩個關鍵環節：數據的高效新增和精準刪除。通過DashScope嵌入模型的配置與應用，讀者將了解如何建立穩定的向量化pipeline，實現從文本內容到向量存儲的完整流程。特別值得關注的是，文章詳細解析了批量文檔插入的ID管理機制，以及基于ID的刪除操作如何在分布式環境中保證數據一致性。

概述

本文將詳細介紹如何使用LangChain框架整合Milvus向量數據庫，重點演示向量數據的新增和刪除操作的完整實現過程。通過實際案例，您將掌握在生產環境中管理向量數據庫的核心技能。

本文繼上一篇文章進一步講述：新版LangChain向量數據庫VectorStore設計詳解-CSDN博客

技術需求與目標

本次實戰的主要目標包括：

建立LangChain與Milvus向量數據庫的集成連接
實現向量數據的批量插入操作
掌握基于ID的數據刪除機制
理解向量數據庫操作的最佳實踐

環境配置與依賴安裝

官方文檔參考

LangChain官方文檔地址：Milvus | 🦜?🔗 LangChain

依賴包安裝

pip install langchain_milvus

核心實現代碼

導入必要的庫文件

from langchain_community.embeddings import DashScopeEmbeddings
# 注意：舊版本使用 from langchain.vectorstores import Milvus
from langchain_milvus import Milvus  # 推薦使用新版本導入方式
from langchain_core.documents import Document

初始化嵌入模型和向量存儲

# 配置DashScope嵌入模型
embeddings = DashScopeEmbeddings(model="text-embedding-v2",  # 使用第二代通用文本嵌入模型max_retries=3,dashscope_api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"  # 請替換為您的實際API密鑰
)# 初始化Milvus向量存儲
vector_store = Milvus(embeddings,connection_args={"uri": "http://192.168.19.152:19530"},  # Milvus服務器連接地址collection_name="langchain_example",  # 集合名稱
)

準備測試數據集

# 創建多樣化的文檔樣本數據
document_1 = Document(page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",metadata={"source": "tweet"},
)document_2 = Document(page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",metadata={"source": "news"},
)document_3 = Document(page_content="Building an exciting new project with LangChain - come check it out!",metadata={"source": "tweet"},
)document_4 = Document(page_content="Robbers broke into the city bank and stole $1 million in cash.",metadata={"source": "news"},
)document_5 = Document(page_content="Wow! That was an amazing movie. I can't wait to see it again.",metadata={"source": "tweet"},
)document_6 = Document(page_content="Is the new iPhone worth the price? Read this review to find out.",metadata={"source": "website"},
)document_7 = Document(page_content="The top 10 soccer players in the world right now.",metadata={"source": "website"},
)document_8 = Document(page_content="LangGraph is the best framework for building stateful, agentic applications!",metadata={"source": "tweet"},
)document_9 = Document(page_content="The stock market is down 500 points today due to fears of a recession.",metadata={"source": "news"},
)document_10 = Document(page_content="I have a bad feeling I am going to get deleted :(",metadata={"source": "tweet"},
)# 將所有文檔組織為列表
documents = [document_1, document_2, document_3, document_4, document_5,document_6, document_7, document_8, document_9, document_10,
]

數據插入操作

# 為每個文檔生成唯一的ID標識符
ids = [str(i+1) for i in range(len(documents))]
print("生成的文檔ID列表:", ids)# 執行批量文檔插入操作
result = vector_store.add_documents(documents=documents, ids=ids)
print("插入操作結果:", result)

數據刪除操作

# 根據指定ID刪除文檔
result = vector_store.delete(ids=["1"])
print("刪除操作結果:", result)# 刪除操作返回的統計信息解釋：
# insert count: 插入數量
# delete count: 刪除數量  
# upsert count: 更新插入數量
# timestamp: 操作時間戳
# success count: 成功數量
# err count: 錯誤數量

操作結果分析

刪除操作執行后，系統返回詳細的統計信息，格式示例如下：

(insert count: 0, delete count: 1, upsert count: 0, timestamp: 456798840753225732, success count: 0, err count: 0)

該結果表明成功刪除了一條記錄，操作過程中未出現錯誤。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/81906.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/81906.shtml
英文地址，請注明出處：http://en.pswp.cn/web/81906.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！