AI Agent開發學習系列 - langchain之memory(1)：內存中的短時記憶

內存中的短時記憶，在 LangChain 中通常指 ConversationBufferMemory 這類“對話緩沖記憶”工具。它的作用是：在內存中保存最近的對話歷史，讓大模型能理解上下文，實現連續對話。

對話緩沖記憶”工具

主要特點

只保留最近的對話內容（如最近N輪），不會無限增長，節省內存和token。
適合短對話、上下文關聯不深的場景。
支持多種變體，如窗口記憶（ConversationBufferWindowMemory，只保留最近k輪）、消息對象記憶等。

典型用法示例

from langchain.memory import ConversationBufferMemory# 創建對話緩沖記憶對象
memory = ConversationBufferMemory()# 添加用戶消息
memory.chat_memory.add_user_message("你好，我是Alex！")
# 添加AI回復
memory.chat_memory.add_ai_message("你好，我是AI助手，請問有什么可以幫助你的嗎？")# 加載當前記憶變量
memory.load_memory_variables({})

結果：

{'history': 'Human: 你好，我是Alex！\nAI: 你好，我是AI助手，請問有什么可以幫助你的嗎？'}

窗口記憶示例

# 實現一個最近的對話窗口，超過窗口條數的對話將被刪除from langchain.memory import ConversationBufferWindowMemory# 創建一個只保留最近1輪對話的記憶窗口
memory = ConversationBufferWindowMemory(k=1)# 保存第一輪對話
memory.save_context({"input": "你好，我是Alex！"}, {"output": "你好，我是AI助手，請問有什么可以幫助你的嗎？"})
# 保存第二輪對話，第一輪會被移除，只保留最近一輪
memory.save_context({"input": "我想學習繪畫。"}, {"output": "好的，我幫你找一些繪畫的資料。"})# 加載當前窗口內的記憶變量（只會返回最近一輪對話）
memory.load_memory_variables({})

結果：

{'history': 'Human: 我想學習繪畫。\nAI: 好的，我幫你找一些繪畫的資料。'}

關鍵點說明

短時記憶讓對話更自然，模型能“記住”最近的上下文。
適合閑聊、問答、連續指令等場景。
若需長時記憶，可結合向量數據庫等工具。

內存中的短時記憶讓大模型能“記住”最近的對話內容，實現自然流暢的多輪對話體驗。

構建記憶實體概念清單

ConversationEntityMemory 是 LangChain 中的一種“實體記憶”工具，專門用于在對話中自動識別、追蹤和存儲“實體”（如人名、地名、組織、專有名詞等），讓大模型能在多輪對話中持續理解和引用這些實體的相關信息。

主要特點

自動抽取實體：每輪對話后，自動識別輸入中的實體（如“小王”“三劍客”等）。
為每個實體建立獨立記憶：存儲與實體相關的描述、上下文，便于后續引用。
適合多角色、多對象、多主題的復雜對話場景。

典型用法示例

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationEntityMemory
from pydantic import SecretStr
import os
import dotenv# 加載環境變量
dotenv.load_dotenv()# 初始化騰訊混元大模型
llm = ChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)# 創建實體記憶對象
memory = ConversationEntityMemory(llm=llm)# 輸入包含多個實體的對話
_input = {"input": "小王、小李和小黃經常結伴同游，三人合稱“三劍客”。",
}# 加載當前記憶變量（此時還未存儲上下文）
memory.load_memory_variables(_input)# 保存對話上下文（輸入和AI回復）
memory.save_context(_input, {"output": "看起來很好玩，我也想加入他們！"}
)# 查詢實體記憶，提取“三劍客”分別是誰
memory.load_memory_variables({"input":"請從上文中找出‘三劍客’分別是誰"})

結果：

{'history': 'Human: 小王、小李和小黃經常結伴同游，三人合稱“三劍客”。\nAI: 看起來很好玩，我也想加入他們！','entities': {'三劍客': '','小王': '小王、小李和小黃經常結伴同游，三人合稱“三劍客”。','小李': '小李與小王、小黃結伴同游，三人合稱“三劍客”。','小黃': '小黃是小王、小李和小黃三人中的其中一人，他們經常一起游玩并合稱為“三劍客”。'}}

關鍵點說明

ConversationEntityMemory 讓模型能“記住”并追蹤對話中的所有重要實體。
支持實體的動態更新和多實體并行管理。
適合需要追蹤多角色、多對象信息的對話系統、智能助理等場景。

ConversationEntityMemory 讓大模型在多輪對話中“認得人、記得事”，實現更智能、更有記憶力的對話體驗。

使用知識圖譜構建記憶

ConversationKGMemory 是 LangChain 中的“知識圖譜記憶”工具。它的作用是：在對話過程中自動抽取知識三元組（subject, predicate, object），構建和維護對話中的知識圖譜，讓大模型能理解和追蹤事實關系，實現更智能的多輪推理和問答。

主要特點

自動抽取三元組：每輪對話后，自動識別“誰-做了什么-對象是什么”結構的信息（如“Lisa holds my phone.” → Lisa-holds-my phone）。
構建知識圖譜：將三元組存入內存，形成結構化的知識網絡。
支持實體關系追蹤、事實推理，適合需要理解復雜關系的對話場景。

典型用法示例

代碼：

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationKGMemory
from pydantic import SecretStr
import os
import dotenv# 加載環境變量
dotenv.load_dotenv()# 初始化騰訊混元大模型
llm = ChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)# 創建知識圖譜記憶對象
memory = ConversationKGMemory(llm=llm)# 保存第一輪對話上下文
memory.save_context({"input":"Where is Lisa?"},{"output":"She is in the meeting room. What do you want to know about her?"}
)# 保存第二輪對話上下文
memory.save_context({"input":"My phone is in her hand."},{"output":"I will tell her."}
)# 查詢記憶變量，獲取與輸入相關的知識圖譜信息
memory.load_memory_variables({"input":"Where is Lisa?"})

結果：

{'history': 'On Lisa: Lisa is in the meeting room. Lisa holds phone.'}

代碼：

# 獲取當前對話中的實體信息
memory.get_current_entities("Who holds my phone?")

結果：

['Lisa']

代碼：

# 獲取輸入句子的知識三元組
memory.get_knowledge_triplets("Lisa holds my phone.")

結果：

[KnowledgeTriple(subject='Lisa', predicate='holds', object_='my phone')]

關鍵點說明

ConversationKGMemory 讓對話系統具備“事實關系追蹤”能力，能自動構建和利用知識圖譜。
適合需要多輪推理、事實問答、關系追蹤的智能對話場景。
支持與 LLM 結合，動態擴展知識圖譜。

ConversationKGMemory 讓大模型在對話中自動“畫知識圖譜”，實現更強的事實理解和多輪推理能力。

長對話在內存中的處理方式: 總結摘要以及token計算

這段代碼演示了如何使用 LangChain 框架結合騰訊混元大模型（hunyuan-lite）實現對話摘要記憶功能。主要流程包括：加載環境變量獲取 API 密鑰，初始化混元大模型作為對話生成器，然后用 ConversationSummaryMemory 記錄多輪對話的上下文（輸入和輸出），最后通過 load_memory_variables 方法獲取當前對話的摘要內容。整體實現了對多輪對話內容的自動總結與記憶。

代碼：

# 導入所需庫
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
from pydantic import SecretStr
import os
import dotenv# 加載環境變量
dotenv.load_dotenv()# 初始化騰訊混元大模型
llm = ChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)# 創建對話摘要記憶對象
memory = ConversationSummaryMemory(llm=llm)# 保存第一輪對話上下文
memory.save_context({"input": "Where is Lisa?"},{"output": "She is in the meeting room. What do you want to know about her?"}
)# 保存第二輪對話上下文
memory.save_context({"input": "My phone is in her hand."},{"output": "I will tell her."}
)# 加載記憶變量（摘要）
memory.load_memory_variables({})

結果：

{'history': "The human asks where Lisa is. The AI informs that Lisa is in the meeting room and inquires about her. The human then tells the AI that his phone is in Lisa's hand. The AI promises to tell Lisa."}

代碼：

# 獲取當前對話歷史消息
messages = memory.chat_memory.messages
# 基于歷史消息生成新的對話摘要
memory.predict_new_summary(messages, "")

結果：

"The human asks where Lisa is. The AI informs the human that Lisa is in the meeting room and asks if there is anything specific they want to know about her. The human mentions that Lisa's phone is in her hand. The AI promises to tell Lisa."

使用ChatMessageHistory來快速獲得對話摘要

ChatMessageHistory 是 LangChain 框架中用于管理和存儲對話消息歷史的一個類。它可以用來記錄用戶和 AI 之間的每一句對話（包括用戶消息和 AI 回復），并以結構化的方式保存。常見用法包括：

創建 ChatMessageHistory 對象后，可以通過 add_user_message 和 add_ai_message 方法分別添加用戶和 AI 的消息。
這些消息可以作為對話記憶傳遞給如 ConversationSummaryMemory 等記憶類，實現對話內容的總結、回溯或上下文管理。
結合大模型（如騰訊混元）時，可以基于 ChatMessageHistory 里的歷史消息生成摘要或用于后續推理。

簡單來說，ChatMessageHistory 主要用于“原始對話內容的存儲與管理”，為對話記憶和上下文處理提供底層支持。

代碼：

# 使用ChatMessageHistory來快速獲得對話摘要from langchain.memory import ChatMessageHistory  # 導入對話消息歷史類
from langchain_openai import ChatOpenAI         # 導入騰訊混元大模型接口
from pydantic import SecretStr
import os
import dotenvdotenv.load_dotenv()  # 加載環境變量# 初始化騰訊混元大模型
llm = ChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)history = ChatMessageHistory()  # 創建對話歷史對象# 添加用戶和AI的對話消息
history.add_user_message("你好，我是Alex！")
history.add_ai_message("你好，我是AI助手，請問有什么可以幫助你的嗎？")# 基于歷史消息創建對話摘要記憶對象
memory = ConversationSummaryMemory.from_messages(llm=llm,chat_memory=history,return_messages=True
)memory.buffer  # 查看當前摘要內容

結果：

'The human introduces themselves as Alex and greets the AI assistant. The AI assistant responds with a friendly welcome and inquires about the assistance the human needs.'

代碼：

# 使用ChatMessageHistory來快速獲得對話摘要from langchain.memory import ChatMessageHistory  # 導入對話消息歷史類
from langchain_openai import ChatOpenAI         # 導入騰訊混元大模型接口
from pydantic import SecretStr
import os
import dotenvdotenv.load_dotenv()  # 加載環境變量# 初始化騰訊混元大模型
llm = ChatOpenAI(model="hunyuan-lite",  # 指定混元模型temperature=0,         # 設定溫度參數api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),  # 從環境變量獲取API密鑰base_url="https://api.hunyuan.cloud.tencent.com/v1",       # 騰訊混元API地址
)history = ChatMessageHistory()  # 創建對話歷史對象# 添加用戶和AI的對話消息
history.add_user_message("你好，我是Alex！")
history.add_ai_message("你好，我是AI助手，請問有什么可以幫助你的嗎？")# 基于歷史消息創建對話摘要記憶對象
memory = ConversationSummaryMemory.from_messages(llm=llm,chat_memory=history,return_messages=True,buffer="\nThe AI asks if there is anything else it can help with."  # 初始摘要內容
)memory.load_memory_variables({})  # 加載記憶變量（摘要）

結果：

{'history': [SystemMessage(content='The AI asks if there is anything else it can help with. The human introduces themselves as Alex and asks if the AI can assist him.')]}

ConversationSummaryBufferMemory

ConversationSummaryBufferMemory 是 LangChain 框架中一種結合“對話摘要”與“窗口緩沖”機制的記憶類。它的核心作用是：在對話歷史較長時，自動對超出窗口長度的舊對話進行摘要，只保留最近的若干輪詳細對話內容，從而兼顧上下文完整性與效率。

主要特點和用法

自動摘要：當對話歷史累計的 token 數超過設定閾值時，自動調用大模型對較早的對話內容進行總結，生成摘要文本，節省內存和上下文長度。
窗口緩沖：始終保留最近的 k 條完整對話（如最近 3 輪），保證最新對話的細節不會丟失。
上下文融合：在后續對話中，模型會同時參考“對話摘要”和“最近詳細對話”，提升長對話場景下的連貫性和效率。
典型用法：適合對話輪數較多、但又希望節省 token 或上下文長度的場景。例如智能客服、長對話機器人等。

# 當對話持續進行且對話內容很多的時候，可以使用ConversationSummaryBufferMemory來存儲對話摘要
# 這是一種非常有用的方式，它會根據token的數量來自動判斷是否需要進行摘要
# 當token數量超過閾值的時候，會自動進行摘要
# 在緩沖區中，會保留最近的k條對話
# 比較久的對話會被刪除，在刪除前會進行摘要from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import ChatOpenAI
from pydantic import SecretStr
import os
import dotenv
dotenv.load_dotenv()class PatchedChatOpenAI(ChatOpenAI):def get_num_tokens_from_messages(self, messages):# 兼容各種 content 類型，簡單估算 token 數text = "".join(str(item)for m in messagesfor item in (m.content if isinstance(m.content, list) else [m.content]))return len(text)  # 你也可以除以2或其它方式粗略估算# 用 PatchedChatOpenAI 替換 ChatOpenAI
llm = PatchedChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)memory = ConversationSummaryBufferMemory(llm=llm,max_token_limit=100,return_messages=True
)memory.save_context({"input":"Where is Lisa?"},{"output":"She is in the meeting room. What do you want to know about her?"}
)memory.save_context({"input":"My phone is in her hand."},{"output":"I will tell her."}
)memory.save_context({"input":"I need her help too. Can you reach to her now?"},{"output":"I will try my best."}
)memory.load_memory_variables({})

結果：

{'history': [SystemMessage(content="The human asks where Lisa is. The AI responds that Lisa is in the meeting room and inquires about the purpose of the question. The human explains that Lisa's phone is in her hand."),AIMessage(content='I will tell her.'),HumanMessage(content='I need her help too. Can you reach to her now?'),AIMessage(content='I will try my best.')]}

ConversationSummaryBufferMemory 讓對話系統既能記住長歷史，又不會因歷史過長而丟失上下文或超出 token 限制，是長對話場景下非常實用的記憶工具。

Conversation Token Buffer使用token長度來決定什么時候刷新內存

ConversationTokenBufferMemory 是 LangChain 框架中一種基于 token 數量進行對話歷史管理的記憶類。它的主要作用是：只保留最近一段 token 數量不超過設定上限的對話內容，超出部分會被自動丟棄，從而有效控制上下文長度，防止超出大模型的 token 限制。

主要特點和用法

按 token 數量裁剪：不是按輪數或消息條數，而是根據所有歷史消息的 token 總數，動態裁剪，只保留最新、總 token 數不超過 max_token_limit 的對話內容。
適合長對話：當對話內容較多時，自動丟棄最早的消息，始終保證傳遞給大模型的上下文不會超長。
高效節省資源：適合對 token 長度敏感、需要嚴格控制上下文長度的場景，比如大模型推理、API 調用有 token 限制時。

from langchain.memory import ConversationTokenBufferMemory
from langchain_openai import ChatOpenAI
from pydantic import SecretStr
import os
import dotenv
dotenv.load_dotenv()class PatchedChatOpenAI(ChatOpenAI):def get_num_tokens_from_messages(self, messages):# 兼容各種 content 類型，簡單估算 token 數text = "".join(str(item)for m in messagesfor item in (m.content if isinstance(m.content, list) else [m.content]))return len(text)  # 你也可以除以2或其它方式粗略估算# 用 PatchedChatOpenAI 替換 ChatOpenAI
llm = PatchedChatOpenAI(model="hunyuan-lite",temperature=0,api_key=SecretStr(os.environ.get("HUNYUAN_API_KEY", "")),base_url="https://api.hunyuan.cloud.tencent.com/v1",
)memory = ConversationTokenBufferMemory(llm=llm,max_token_limit=100,return_messages=True
)memory.save_context({"input":"Where is Lisa?"},{"output":"She is in the meeting room. What do you want to know about her?"}
)memory.save_context({"input":"My phone is in her hand."},{"output":"I will tell her."}
)memory.save_context({"input":"I need her help too. Can you reach to her now?"},{"output":"I will try my best."}
)memory.save_context({"input":"I do not need to find her anymore"},{"output":"Okay. I got it."}
)memory.load_memory_variables({})

結果：

{'history': [AIMessage(content='I will try my best.'),HumanMessage(content='I do not need to find her anymore'),AIMessage(content='Okay. I got it.')]}