本地大模型編程實戰(23)用智能體(Agent)實現基于SQL數據構建問答系統(2)

本文將用 智能體(Agent) 實現對 SQLite 數據庫的查詢：用戶用自然語言提出問題，智能體也用自然語言根據數據庫的查詢結果回答問題。

本次將分別在英文、中文環境下，使用 qwen2.5 、 MFDoom/deepseek-r1-tool-calling:7b 以及 llama3.1 做實驗。

準備

您可以按下面的步驟準備本地編程環境。

計算機
本文涉及的代碼可以在沒有顯存的環境中執行。建議最低配置為：

CPU: Intel i5-8400 2.80GHz
內存: 16GB

Visual Studio Code 和 venv
Visual Studio Code 是很受歡迎的開發工具，建議用 venv 創建虛擬環境, 詳見：
在Visual Studio Code中配置venv。
Ollama
基于 Ollama 平臺，我們可以在本地方便的使用 llama3.1、qwen2.5、deepseek 等各種 LLM(大語言模型)。詳見：
在langchian中使用本地部署的llama3.1大模型。

創建 `SQLite`

我們直接使用之前創建好的 SQLite 數據庫：

# 獲取當前執行的程序文件的文件夾路徑
current_folder = os.path.dirname(os.path.abspath(__file__))db_file_path = os.path.join(current_folder, 'assert/Chinook.db')from langchain_community.utilities import SQLDatabasedb = SQLDatabase.from_uri(f"sqlite:///{db_file_path}")

智能體(Agent)

LangChain 有一個 SQL智能體，它提供了一種比鏈更靈活的與 SQL 數據庫交互的方式。使用 SQL智能體 的主要優點是：

它可以根據數據庫的架構以及數據庫的內容（如描述特定表）回答問題
它可以通過運行生成的查詢、捕獲執行棧并正確地重新生成它來從錯誤中恢復
它可以根據需要多次查詢數據庫以回答用戶問題
… 等等

創建 `SQLite` 工具

為了初始化智能體，我們將使用 SQLDatabaseToolkit 創建一組工具：

創建和執行查詢
檢查查詢語法
檢索表描述
… 等等

def create_tools(llm_model_name):"""創建工具"""llm = ChatOllama(model=llm_model_name,temperature=0, verbose=True)toolkit = SQLDatabaseToolkit(db=db, llm=llm)tools = toolkit.get_tools()print(tools)return tools

系統提示詞

我們來創建指導智能體的中英文提示詞。

英文版

system = """You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct SQLite query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 5 results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the given tools. Only use the information returned by the tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.You have access to the following tables: {table_names}
""".format(table_names=db.get_usable_table_names()
)system_message = SystemMessage(content=system)

中文版

system = """您是設計用于與 SQL 數據庫交互的代理。用中文回答問題。
給定一個輸入問題，創建一個語法正確的 SQLite 查詢來運行，然后查看查詢結果并返回答案。
除非用戶指定他們希望獲得的特定數量的示例，否則請始終將查詢限制為最多 5 個結果。
您可以按相關列對結果進行排序，以返回數據庫中最有趣的示例。
切勿查詢特定表中的所有列，僅詢問給定問題的相關列。
您可以使用與數據庫交互的工具。
僅使用給定的工具。僅使用工具返回的信息來構建最終答案。
在執行查詢之前，您必須仔細檢查查詢。如果在執行查詢時出現錯誤，請重寫查詢并重試。請勿對數據庫執行任何 DML 語句（INSERT、UPDATE、DELETE、DROP 等）。您有權訪問以下數據庫表： {table_names}
""".format(table_names=db.get_usable_table_names()
)system_message = SystemMessage(content=system)

上述提示詞對大模型生成SQL語句的行為做了比較嚴格的限制，以防止生成錯誤的SQL破壞數據庫。

初始化智能體

使用 create_react_agent 方法初始化智能體，定義問答方法。

def ask(llm_model_name,question):"""詢問智能體"""tools = create_tools(llm_model_name)llm = ChatOllama(model=llm_model_name,temperature=0, verbose=True)agent_executor = create_react_agent(llm, tools, state_modifier=system_message)for s in agent_executor.stream({"messages": [HumanMessage(content=question)]}):print(s)print("----")

React智能體（React Agent）指的是一種能自主決策和執行任務的AI智能體，它結合了大語言模型（LLM）和工具調用，可以根據環境和任務需求動態調整自己的行為。
簡單理解：

React = 解釋 + 計算（Reason + Act）
- 先分析當前的任務和數據（Reason）
- 然后做出相應的行動（Act）
如何工作？
- 先閱讀輸入信息
- 決定是否調用某個工具（如數據庫查詢、API 調用）
- 處理返回的結果，再次分析，繼續執行任務
示例：
- 您輸入：“明天的天氣如何？”
- 智能體會先思考：“這個問題需要調用天氣 API。”
- 然后它調用天氣 API，獲取數據后再回復你：“明天是晴天，氣溫 20°C。”

見證效果

下面我們出3個同樣的中文、英文問題，看看這三款大模型的表現如何。

當用中文提問時，系統提示詞也用中文版；反之亦然。

英文問題

    questions = ["How many Employees are there?","Which country's customers spent the most?","Describe the PlaylistTrack table"]

我們先看看 qwen2.5 是如何思考回答 “How many Employees are there?” 的：

[QuerySQLDatabaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", ...)]
{'agent': {'messages': [AIMessage(content='',..., tool_calls=[{'name': 'sql_db_schema', 'args': {'table_names': 'Employee'}, ..., 'type': 'tool_call'}], ...)]}}
----
{'tools': {'messages': [ToolMessage(content='\nCREATE TABLE "Employee" (\n\t"EmployeeId" INTEGER NOT NULL, \n\t"LastName" NVARCHAR(20) NOT NULL, \n\t"FirstName" NVARCHAR(20) NOT NULL, \n\t"Title" NVARCHAR(30), \n\t"ReportsTo" INTEGER, \n\t"BirthDate" DATETIME, \n\t"HireDate" DATETIME, \n\t"Address" NVARCHAR(70), \n\t"City" NVARCHAR(40), \n\t"State" NVARCHAR(40), \n\t"Country" NVARCHAR(40), \n\t"PostalCode" NVARCHAR(10), \n\t"Phone" NVARCHAR(24), \n\t"Fax" NVARCHAR(24), \n\t"Email" NVARCHAR(60), \n\tPRIMARY KEY ("EmployeeId"), \n\tFOREIGN KEY("ReportsTo") REFERENCES "Employee" ("EmployeeId")\n)\n\n/*\n3 rows from Employee table:\nEmployeeId\tLastName\tFirstName\tTitle\tReportsTo\tBirthDate\tHireDate\tAddress\tCity\tState\tCountry\tPostalCode\tPhone\tFax\tEmail\n1\tAdams\tAndrew\tGeneral Manager\tNone\t1962-02-18 00:00:00\t2002-08-14 00:00:00\t11120 Jasper Ave NW\tEdmonton\tAB\tCanada\tT5K 2N1\t+1 (780) 428-9482\t+1 (780) 428-3457\tandrew@chinookcorp.com\n2\tEdwards\tNancy\tSales Manager\t1\t1958-12-08 00:00:00\t2002-05-01 00:00:00\t825 8 Ave SW\tCalgary\tAB\tCanada\tT2P 2T3\t+1 (403) 262-3443\t+1 (403) 262-3322\tnancy@chinookcorp.com\n3\tPeacock\tJane\tSales Support Agent\t2\t1973-08-29 00:00:00\t2002-04-01 00:00:00\t1111 6 Ave SW\tCalgary\tAB\tCanada\tT2P 5M5\t+1 (403) 262-3443\t+1 (403) 262-6712\tjane@chinookcorp.com\n*/', name='sql_db_schema',...)]}}
----
{'agent': {'messages': [AIMessage(content='', ..., tool_calls=[{'name': 'sql_db_query_checker', 'args': {'query': 'SELECT COUNT(*) FROM Employee'}, ..., 'type': 'tool_call'}], ...)]}}
----
{'tools': {'messages': [ToolMessage(content='The provided SQL query is:\n\n```sql\nSELECT COUNT(*) FROM Employee\n```\n\nThis query appears to be correct and does not contain any of the common mistakes listed. It simply counts all rows in the `Employee` table.\n\nTherefore, the final SQL query remains:\n\n```sql\nSELECT COUNT(*) FROM Employee\n```', name='sql_db_query_checker', ...)]}}
----
{'agent': {'messages': [AIMessage(content='', ..., tool_calls=[{'name': 'sql_db_query', 'args': {'query': 'SELECT COUNT(*) FROM Employee'}, ..., 'type': 'tool_call'}], ...)]}}
----
{'tools': {'messages': [ToolMessage(content='[(8,)]', name='sql_db_query', ...)]}}
----
{'agent': {'messages': [AIMessage(content='There are 8 employees in the database.', ...)]}}
----

從上面的輸出來看，智能體是一步一步推理出最終的答案的：

QuerySQLDatabaseTool
給出提示：此工具的輸入是詳細且正確的 SQL 查詢，輸出是數據庫的結果。如果查詢不正確，將返回錯誤消息。如果返回錯誤，請重寫查詢，檢查查詢，然后重試。如果您遇到“字段列表”中未知列“xxxx”的問題，請使用 sql_db_schema 查詢正確的表字段。
agent
生成名為 sql_db_schema 的 tool_call ,確定表名為 Employee。
tools
執行前面tool_call，生成名為 sql_db_schema 的 ToolMessage ，找出了表 Employee 的 DML ，并成功查詢出3條信息。
agent
生成名為 sql_db_query_checker 的 tool_call，其中包含 SQL語句。
tools
驗證前面 tool_call 中的SQL是否正確。生成名為 sql_db_query_checker 的 ToolMessage，其內容顯示最終確定了正確的SQL語句。
agent
生成名為 sql_db_query 的 tool_call，其中包含SQL語句：SELECT COUNT(*) FROM Employee
tools
執行前面的 tool_call ，生成名為 sql_db_query 的 ToolMessage ，其中已經包含了 SQLite的返回值。
agent
基于前面的結果生成了最終回答。

通過上述的推理過程顯示：智能體與工具進行了多輪交互后才給出最終回答。

下表直接列出各個大模型針對這3個問題的處理結果：

	How many Employees are there?	Which country’s customers spent the most?	Describe the PlaylistTrack table
`qwen2.5`	There are 8 employees in the database.	The country with the highest spending by customers is USA, followed by Canada, France, Brazil, and Germany.\n\nHere are the top 5 countries based on total invoice amount:\n\n1. USA - $523.06\n…	The `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table used to establish a many-to-many relationship between the `Playlist` and `Track` tables…
`llama3.1`	There are 8 Employees.	The country’s customers that spent the most are from the USA, with a total spend of $523.06.	The PlaylistTrack table contains information about the tracks in each playlist. It has two columns: PlaylistId and TrackId,…
`MFDoom/deepseek-r1-tool-calling:7b`	There are \boxed{8} Employees in the database.	未推理出正確的的SQL	Alright, …, it looks like this table has two main columns: PlaylistId and TrackId…

從上述結果來看，qwen2.5給出的回答最清晰全面，llama3.1也不錯，MFDoom/deepseek-r1-tool-calling:7b 未能正確回答第2個問題。

中文問題

    questions = ["有多少名員工？","哪個國家的顧客花費最多？","描述 PlaylistTrack 表" ]

	有多少名員工？	哪個國家的顧客花費最多？	描述 PlaylistTrack 表
`qwen2.5`	數據庫中一共有 8 名員工。	花費最多的國家及其總金額如下：\n\n1. 美國 (USA) - 總\金額: $523.06…\n\n因此，顧客花費最多的國家是美國。	`PlaylistTrack` 表包含兩個列：`PlaylistId` 和 `TrackId`。這兩個列共同作為主鍵，表示播放列表和其中的歌曲之間的關系。…
`llama3.1`	有 8 名員工。	答案是：美國。	根據工具的輸出，PlaylistTrack 表是一個連接表，它將Playlist和Track兩個表關聯起來。它有兩個列：PlaylistId和TrackId，…
`MFDoom/deepseek-r1-tool-calling:7b`	未正確做答	未正確做答	未正確做答