上一篇文章 用大語言模型LLM查詢圖數據庫NEO4J(1) 介紹了使用
GraphQACypherChain
查詢NEO4J
。用它實現簡單快捷,但是不容易定制,在生產環境中可能會面臨挑戰。
本文將基于langgraph
框架,用LLM(大語言模型)
查詢圖數據庫NEO4J
。它可以定義清晰復雜的工作流,能應對比較復雜的應用場景。
以下是即將實現的可視化LangGraph
流程:
文章目錄
- 定義狀態
- 第一個節點:護欄/guardrails
- 節點:生成Cypher/generate_cypher(查詢NEO4J的語句)
- 使用少量例子增強提示詞
- 用提示詞推理Cypher
- 節點:執行Cypher查詢
- 生成最終回答
- 構建工作流
- 見證效果
- 總結
- 代碼
- 參考
定義狀態
我們將首先定義 LangGraph 應用程序的輸入、輸出和整體狀態。
我們可以認為所謂的狀態是:節點之間數據交換的數據格式。它們都繼承自TypedDict
。
from operator import add
from typing import Annotated, List
from typing_extensions import TypedDictclass InputState(TypedDict):"""輸入"""question: strclass OverallState(TypedDict):"""整體"""question: strnext_action: strcypher_statement: strcypher_errors: List[str]database_records: List[dict]steps: Annotated[List[str], add]class OutputState(TypedDict):"""輸出"""answer: strsteps: List[str]cypher_statement: str
第一個節點:護欄/guardrails
第一個節點 guardrails 是一個簡單的“護欄”步驟:我們會驗證問題是否與電影或其演員陣容相關,如果不是,我們會通知用戶我們無法回答任何其他問題。否則,我們將進入 Cypher
生成節點。
from typing import Literalfrom langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Fieldguardrails_system = """
As an intelligent assistant, your primary objective is to decide whether a given question is related to movies or not.
If the question is related to movies, output "movie". Otherwise, output "end".
To make this decision, assess the content of the question and determine if it refers to any movie, actor, director, film industry,
or related topics. Provide only the specified output: "movie" or "end".
"""
guardrails_prompt = ChatPromptTemplate.from_messages([("system",guardrails_system,),("human",("{question}"),),]
)class GuardrailsOutput(BaseModel):decision: Literal["movie", "end"] = Field(description="Decision on whether the question is related to movies")from langchain_ollama import ChatOllama
llm_llama = ChatOllama(model="llama3.1",temperature=0, verbose=True)guardrails_chain = guardrails_prompt | llm_llama.with_structured_output(GuardrailsOutput)def guardrails(state: InputState) -> OverallState:"""Decides if the question is related to movies or not."""guardrails_output = guardrails_chain.invoke({"question": state.get("question")})database_records = Noneif guardrails_output.decision == "end":database_records = "This questions is not about movies or their cast. Therefore I cannot answer this question."return {"next_action": guardrails_output.decision,"database_records": database_records,"steps": ["guardrail"],}
該節點使用llama3.1
,通過提示詞判斷輸出的內容是否與電影有關:如果有關則返回movie
,在后面會生成Cypher并查詢圖數據庫NEO4J,如果無關則返回end
,交給大語言模型處理。
節點:生成Cypher/generate_cypher(查詢NEO4J的語句)
使用少量例子增強提示詞
將自然語言轉換為準確的 Cypher
查詢極具挑戰性。增強此過程的一種方法是提供相關的少樣本示例來指導 LLM
生成查詢。為此,我們將使用 Semantic SimilarityExampleSelector
來動態選擇最相關的示例。
# Few-shot prompting
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_neo4j import Neo4jVectorexamples = [{"question": "How many artists are there?","query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",},{"question": "Which actors played in the movie Casino?","query": "MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name",},{"question": "How many movies has Tom Hanks acted in?","query": "MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)",},{"question": "List all the genres of the movie Schindler's List","query": "MATCH (m:Movie {title: 'Schindler's List'})-[:IN_GENRE]->(g:Genre) RETURN g.name",},{"question": "Which actors have worked in movies from both the comedy and action genres?","query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",},{"question": "Which directors have made movies with at least three different actors named 'John'?","query": "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",},{"question": "Identify movies where directors also played a role in the film.","query": "MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",},{"question": "Find the actor with the highest number of movies in the database.","query": "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",},
]from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")example_selector = SemanticSimilarityExampleSelector.from_examples(examples, embeddings, Neo4jVector, k=5, input_keys=["question"]
)
用提示詞推理Cypher
我們馬上要實現 Cypher
生成鏈。提示詞包含圖數據的結構、動態選擇的少樣本示例以及用戶的問題。這種組合能夠生成 Cypher
查詢,以從圖數據庫中檢索相關信息。
import osdef create_enhanced_graph():"""創建NEO4J對象"""os.environ["NEO4J_URI"] = "bolt://localhost:7687"os.environ["NEO4J_USERNAME"] = "neo4j"os.environ["NEO4J_PASSWORD"] = "neo4j"from langchain_neo4j import Neo4jGraphenhanced_graph = Neo4jGraph(enhanced_schema=True)#print(enhanced_graph.schema)return enhanced_graph
enhanced_graph = create_enhanced_graph()from langchain_core.output_parsers import StrOutputParsertext2cypher_prompt = ChatPromptTemplate.from_messages([("system",("Given an input question, convert it to a Cypher query. No pre-amble.""Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"),),("human",("""You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!
Here is the schema information
{schema}Below are a number of examples of questions and their corresponding Cypher queries.{fewshot_examples}User input: {question}
Cypher query:"""),),]
)llm_qwen = ChatOllama(model="qwen2.5",temperature=0, verbose=True)text2cypher_chain = text2cypher_prompt | llm_qwen | StrOutputParser()def generate_cypher(state: OverallState) -> OverallState:"""Generates a cypher statement based on the provided schema and user input"""NL = "\n"fewshot_examples = (NL * 2).join([f"Question: {el['question']}{NL}Cypher:{el['query']}"for el in example_selector.select_examples({"question": state.get("question")})])generated_cypher = text2cypher_chain.invoke({"question": state.get("question"),"fewshot_examples": fewshot_examples,"schema": enhanced_graph.schema,})return {"cypher_statement": generated_cypher, "steps": ["generate_cypher"]}
節點:執行Cypher查詢
現在我們添加一個節點來執行生成的 Cypher
語句。如果圖數據庫沒有返回結果,我們應該明確告知 LLM
,因為留空上下文有時會導致 LLM
幻覺。
可以在此節點前增加
校驗查詢
、更正查詢
等節點提升結果的準確性。當然,增加這樣的節點也不一定能達到預期效果,因為它們本身也可能出錯,所以要小心對待。
no_results = "I couldn't find any relevant information in the database"def execute_cypher(state: OverallState) -> OverallState:"""Executes the given Cypher statement."""records = enhanced_graph.query(state.get("cypher_statement"))return {"database_records": records if records else no_results,"next_action": "end","steps": ["execute_cypher"],}
生成最終回答
最后一步是生成答案。這需要將初始問題與圖數據庫輸出相結合,以生成相關的答案。
generate_final_prompt = ChatPromptTemplate.from_messages([("system","You are a helpful assistant",),("human",("""Use the following results retrieved from a database to provide
a succinct, definitive answer to the user's question.Respond as if you are answering the question directly.Results: {results}
Question: {question}"""),),]
)generate_final_chain = generate_final_prompt | llm_llama | StrOutputParser()def generate_final_answer(state: OverallState) -> OutputState:"""Decides if the question is related to movies."""final_answer = generate_final_chain.invoke({"question": state.get("question"), "results": state.get("database_records")})return {"answer": final_answer, "steps": ["generate_final_answer"]}
構建工作流
我們將實現 LangGraph
工作流。
先定義條件邊函數:
def guardrails_condition(state: OverallState,
) -> Literal["generate_cypher", "generate_final_answer"]:if state.get("next_action") == "end":return "generate_final_answer"elif state.get("next_action") == "movie":return "generate_cypher"
這個函數將添加到 護欄/guardrails
后面,根據上一步是否生成了Cypher查詢來決定路由到下面哪個節點去。
下面的代碼將把以上的節點和邊連接起來,成為一個完整的工作流:
from langgraph.graph import END, START, StateGraphlanggraph = StateGraph(OverallState, input=InputState, output=OutputState)
langgraph.add_node(guardrails)
langgraph.add_node(generate_cypher)
langgraph.add_node(execute_cypher)
langgraph.add_node(generate_final_answer)langgraph.add_edge(START, "guardrails")
langgraph.add_conditional_edges("guardrails",guardrails_condition,
)langgraph.add_edge("generate_cypher","execute_cypher")
langgraph.add_edge("execute_cypher","generate_final_answer")langgraph.add_edge("generate_final_answer", END)langgraph = langgraph.compile()
見證效果
萬事俱備,我們給構建好的langgraph
工作流提兩個問題,看看它的表現吧:
def ask(question:str):response = langgraph.invoke({"question": question})print(f'response:\n{response["answer"]}')ask("What's the weather in Spain?")
ask("What was the cast of the Casino?")
第一個問題與電影無關,沒有查詢NEO4J
,問題直接由LLM
做了回答:
I'm happy to help with that! Unfortunately, I don't have access to real-time weather information for specific locations like Spain. However, I can suggest checking a reliable weather website or app, such as AccuWeather or Weather.com, for the most up-to-date forecast.Would you like me to provide some general information about Spain's climate instead?
對于第二個問題,執行時間較長,最后給出的回答是:
The cast of the movie "Casino" included James Woods, Joe Pesci, Robert De Niro, and Sharon Stone.
Nice!
總結
本文演示了通過比較復雜的langgraph
構建了圖形化的工作流,由它來處理對圖數據的查詢。
我覺得使用這種方式的弊端是比較麻煩,好處則是思路很清晰、容易定制修改,更加適合在生產環境中構建比較復雜的AI應用或者智能體Agent。
代碼
本文涉及的所有代碼以及相關資源都已經共享,參見:
- github
- gitee
為便于找到代碼,程序文件名稱最前面的編號與本系列文章的文檔編號相同。
參考
- Build a Question Answering application over a Graph Database
🪐感謝您觀看,祝好運🪐