從零搭建智能搜索代理：LangGraph + 實時搜索 + PDF導出完整項目實戰

傳統的AI聊天系統往往局限于預訓練數據的知識范圍，無法獲取實時信息。本文將詳細闡述如何構建一個基于LangGraph的智能代理系統，該系統能夠智能判斷何時需要進行網絡搜索、有效維護對話上下文，并具備將對話內容導出為PDF文檔的功能。

本系統的核心特性包括：基于智能判斷機制的自動網絡搜索觸發、跨多輪對話的上下文狀態管理、多策略搜索機制與智能回退、透明的信息源追溯體系，以及專業級PDF文檔生成功能。

LangGraph技術架構

LangGraph是專為構建有狀態多角色應用程序而設計的框架，特別適用于與大型語言模型的集成開發。相較于傳統的聊天接口，LangGraph提供了更為復雜的工作流管理能力。

該框架的核心優勢體現在四個方面：其一是跨對話輪次的狀態管理機制，確保系統能夠記憶和利用歷史交互信息；其二是基于用戶輸入或上下文的條件路由功能，使系統能夠根據不同情況采取相應的處理策略；其三是支持決策點的多步驟工作流，允許復雜的業務邏輯實現；其四是人機協作交互模式，在必要時引入人工干預。

從架構設計角度，LangGraph可以視為AI應用程序的狀態機實現，其中每個節點代表特定的功能模塊（如輸入分類、信息搜索、響應生成），而節點間的連接邊則定義了數據流和控制流。

本智能搜索代理系統采用模塊化架構設計，由六個核心組件協同工作：

     ┌─────────────────────────────────────────────────────────────┐  │                    增強搜索代理                              │  ├─────────────────────────────────────────────────────────────┤  │  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────┐ │  │  │ 搜索觸發        │  │ 搜索策略        │  │ 結果         │ │  │  │ 智能            │  │ 管理器          │  │ 處理器       │ │  │  └─────────────────┘  └─────────────────┘  └──────────────┘ │  ├─────────────────────────────────────────────────────────────┤  │  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────┐ │  │  │ 上下文          │  │ 多來源          │  │ PDF          │ │  │  │ 管理器          │  │ 搜索引擎        │  │ 生成器       │ │  │  └─────────────────┘  └─────────────────┘  └──────────────┘ │  ├─────────────────────────────────────────────────────────────┤  │              LangGraph 工作流引擎                           │  └─────────────────────────────────────────────────────────────┘

基礎模塊

搜索觸發智能模塊

搜索觸發智能模塊是系統的核心決策組件，負責自動識別何時需要進行網絡搜索。該模塊采用基于模式識別的智能分析方法，而非簡單的關鍵詞匹配機制。

 class SearchTriggerIntelligence:  def __init__(self):  # 時間敏感關鍵詞定義，用于識別對當前信息的需求  self.temporal_keywords = {  'immediate': ['now', 'currently', 'today', 'this week'],  'recent': ['latest', 'recent', 'new', 'fresh', 'updated'],  'trending': ['trending', 'popular', 'viral', 'breaking'],  'temporal_markers': ['2025', '2024', 'this year'],  'news_indicators': ['news', 'developments', 'updates']  }  # 需要實時信息更新的主題類別定義  self.current_info_topics = {  'technology': ['ai', 'artificial intelligence', 'tech', 'software'],  'finance': ['market', 'stock', 'crypto', 'bitcoin', 'economy'],  'news': ['politics', 'election', 'government', 'policy'],  'science': ['research', 'study', 'discovery', 'breakthrough']  }

該模塊通過多維度分析用戶輸入來做出搜索決策。時間指示器分析識別諸如"latest"、“current”、"2024"等表示時效性需求的詞匯。主題類別分析涵蓋技術、金融、新聞、科學等需要實時信息更新的領域。書籍模式檢測專門用于識別關于特定出版物的查詢請求。不確定性信號檢測則在AI系統表達知識局限性時觸發搜索機制。

上下文感知對話管理

對話AI系統面臨的主要技術挑戰之一是在多輪交互中保持上下文連貫性。本系統通過智能上下文解析機制有效解決了這一問題。

 def _resolve_contextual_references(self, user_input: str, context: Dict = None) -> str:  """  通過分析對話歷史解析用戶輸入中的上下文引用  """  # 定義后續對話模式的正則表達式  follow_up_patterns = [  r'^(give me|provide|write|create)\s+(a\s+)?(summary|overview|analysis)',  r'^(tell me more|more about|elaborate|expand)',  r'^(summarize|analyze|explain)\s+(it|this|that)',  r'^\d+\s+word\s+(summary|analysis|overview)'  ]  is_follow_up = any(re.match(pattern, user_input.lower()) for pattern in follow_up_patterns)  if is_follow_up:  # 從對話歷史中提取最近討論的主題  recent_topic = self._extract_recent_topic(context.get('messages', []))  if recent_topic:  return f"Provide a summary of {recent_topic}"  return user_input

該機制的核心在于理解用戶請求的語義關聯性。當用戶在討論特定書籍后提出"給我一個500字的總結"這樣的請求時，系統能夠準確識別其指向性，理解用戶需要的是該特定書籍的總結，而非通用性摘要。

LangGraph工作流構建

系統的工作流基于LangGraph框架構建，采用狀態圖模式管理整個對話流程。

 from langgraph.graph import StateGraph, START, END  
from typing import TypedDict, Annotated, List, Dict
class ConversationState(TypedDict):  messages: Annotated[list, add_messages]  user_input: str  conversation_type: str  context: dict  session_id: str  needs_web_search: bool  search_results: List[Dict]  search_queries: List[str]  sources: List[str]def create_workflow():  workflow = StateGraph(ConversationState)  # 注冊功能節點  workflow.add_node("classify_input", classify_input)  workflow.add_node("search_web_information", search_web_information)  workflow.add_node("generate_search_enhanced_response", generate_search_enhanced_response)  workflow.add_node("handle_chat", handle_chat)  # 定義節點間的連接關系  workflow.add_edge(START, "classify_input")  workflow.add_conditional_edges(  "classify_input",  route_conversation,  {  "search_web_information": "search_web_information",  "handle_chat": "handle_chat"  }  )  return workflow.compile()

智能輸入分類機制

輸入分類功能是整個系統的決策起點，負責分析用戶輸入并確定后續處理策略。

 def classify_input(state: ConversationState) -> ConversationState:  """用戶輸入分類與搜索需求判斷"""  # 構建上下文信息  context = {  'messages': state.get('messages', []),  'session_id': state.get('session_id', ''),  'conversation_history': state.get('context', {})  }  # 調用搜索智能判斷機制  search_decision = search_intelligence.should_trigger_search(  state['user_input'], context  )  # 使用大型語言模型進行對話類型分類  prompt = f"""  請將以下用戶輸入歸類到相應類別：  輸入內容: "{state['user_input']}"  可選類別:  - chat: 日常對話、一般性問題、閑聊交流  - research: 用戶請求針對特定主題的研究分析  - task: 用戶尋求特定任務或問題的解決方案  - help: 用戶需要理解或學習特定概念  請僅返回類別名稱。  """  response = llm.invoke([HumanMessage(content=prompt)])  conversation_type = response.content.strip().lower()  return {  **state,  "conversation_type": conversation_type,  "needs_web_search": search_decision.should_search,  "search_queries": search_decision.suggested_queries,  "context": {  **state.get("context", {}),  "search_decision": {  "confidence": search_decision.confidence,  "reasoning": search_decision.reasoning,  "topic_category": search_decision.topic_category,  "urgency_level": search_decision.urgency_level  }  }  }

多策略網絡搜索實現

當系統確定需要進行網絡搜索時，采用多層次策略確保搜索的成功率和結果質量。

 def search_web_information(state: ConversationState) -> ConversationState:  """多策略網絡搜索執行"""  search_attempts = 0  max_attempts = 3  all_search_results = []  # 第一策略：使用智能推薦的搜索查詢  suggested_queries = state.get('search_queries', [])  for query in suggested_queries[:2]:  search_attempts += 1  try:  results = search_tool.search(query, max_results=6)  if results:  all_search_results.extend(results)  break  except Exception as e:  continue  # 第二策略：在無搜索結果時使用增強查詢  if not all_search_results and search_attempts < max_attempts:  enhanced_queries = _generate_enhanced_queries(  state['user_input'],   topic_category  )  for query in enhanced_queries[:2]:  search_attempts += 1  try:  results = search_tool.search(query, max_results=5)  if results:  all_search_results.extend(results)  break  except Exception as e:  continue  # 第三策略：簡化回退查詢機制  if not all_search_results:  simplified_query = _create_simplified_query(state['user_input'])  try:  results = search_tool.search(simplified_query, max_results=5)  all_search_results.extend(results)  except Exception as e:  pass  return {  **state,  "search_results": all_search_results,  "sources": [r.get('url', '') for r in all_search_results]  }

上下文感知響應生成

響應生成模塊負責整合搜索結果與對話上下文，生成連貫且相關的回復。

 def generate_search_enhanced_response(state: ConversationState) -> ConversationState:  """基于搜索結果和對話上下文生成響應"""  # 構建對話上下文信息  conversation_context = ""  recent_messages = state.get('messages', [])[-6:]  if recent_messages:  conversation_context = "\n\n**對話上下文信息:**\n"  for msg in recent_messages:  if hasattr(msg, 'content') and not msg.content.startswith('🔍'):  role = "用戶" if "HumanMessage" in str(type(msg)) else "助手"  conversation_context += f"{role}: {msg.content[:200]}...\n"  # 格式化搜索結果信息  search_context = ""  if state.get('search_results'):  search_context = f"\n\n🔍 **網絡搜索結果**:\n"  for i, result in enumerate(state['search_results'][:10], 1):  search_context += f"{i}. **{result.get('title', 'No title')}**\n"  search_context += f"   {result.get('snippet', 'No description')}\n"  search_context += f"   來源: {result.get('url', 'No URL')}\n\n"  prompt = f"""  基于以下信息回答用戶問題: "{state['user_input']}"  {conversation_context}  {search_context}  響應生成準則:  1. **上下文連貫性**: 充分考慮之前的對話內容  2. **信息時效性**: 優先使用搜索結果中的最新數據  3. **來源透明性**: 明確標注信息來源  4. **信息優先級**: 最新來源信息優于訓練數據  5. **回答針對性**: 確保回答直接針對用戶問題  """  response = llm.invoke([HumanMessage(content=prompt)])  return {  **state,  "messages": state["messages"] + [  HumanMessage(content=state["user_input"]),   AIMessage(content=response.content)  ]  }

高級功能模塊

PDF文檔生成系統

系統的一個獨特功能是能夠將任何對話內容導出為專業格式的PDF文檔。該功能基于ReportLab庫實現，提供了完整的文檔格式化和樣式控制。

 def generate_pdf_from_markdown(content: str, title: str, session_id: str) -> str:  """基于Markdown內容生成專業PDF文檔"""  # 創建具有專業樣式的PDF文檔  doc = SimpleDocTemplate(filepath, pagesize=A4)  # 定義專業文檔樣式  title_style = ParagraphStyle(  'CustomTitle',  fontSize=24,  spaceAfter=30,  alignment=1,  # 居中對齊  textColor=HexColor('#2c3e50')  )  # 處理Markdown內容并轉換為PDF元素  story = []  story.append(Paragraph(title, title_style))  # 解析Markdown并轉換為PDF元素  lines = content.split('\n')  for line in lines:  if line.startswith('# '):  story.append(Paragraph(line[2:], heading_style))  elif line.startswith('## '):  story.append(Paragraph(line[3:], subheading_style))  else:  story.append(Paragraph(line, body_style))  doc.build(story)  return filename

錯誤處理與降級機制

系統實現了完善的錯誤處理和優雅降級機制，確保在外部服務故障時仍能提供有價值的響應。

 def handle_search_errors(self, error: Exception, query: str) -> SearchResults:  """搜索錯誤處理與優雅降級"""  if "rate limit" in str(error).lower():  return self._create_rate_limit_response(query)  elif "network" in str(error).lower():  return self._create_network_error_response(query)  else:  return self._create_knowledge_based_fallback(query)

Web界面集成

系統提供了基于Flask的現代化Web界面，支持實時聊天和PDF導出功能。

 @app.route('/chat', methods=['POST'])  
def chat():  """主要聊天端點，支持PDF生成功能"""  data = request.get_json()  user_message = data.get('message', '')  session_id = session.get('session_id', str(uuid.uuid4()))  # 執行LangGraph工作流  result = workflow_app.invoke(initial_state)  # 對超過10詞的響應生成PDF  word_count = len(last_response.split())  if word_count > 10:  pdf_filename = generate_pdf_from_markdown(  last_response,   f"Chat Export: {user_message[:50]}...",   session_id  )  if pdf_filename:  response_data['pdf_available'] = True  response_data['pdf_filename'] = pdf_filename  return jsonify(response_data)

性能優化與擴展性

緩存策略實現

通過緩存近期搜索結果，系統能夠減少API調用次數，提高響應速度。

 # 緩存近期搜索結果以減少API調用  @lru_cache(maxsize=100)  def cached_search(query: str, max_results: int) -> List[Dict]:  return search_tool.search(query, max_results)

異步處理機制

在生產環境部署中，建議對搜索操作采用異步處理機制。

 import asyncio  
import aiohttpasync def async_search(queries: List[str]) -> List[Dict]:  """并發執行多個搜索查詢"""  tasks = [search_single_query(query) for query in queries]  results = await asyncio.gather(*tasks, return_exceptions=True)  return [r for r in results if not isinstance(r, Exception)]

系統測試與驗證

單元測試設計

 def test_book_pattern_detection():  intelligence = SearchTriggerIntelligence()  decision = intelligence.should_trigger_search(  "Tell me about the book Nexus by Yuval Noah Harari"  )  assert decision.should_search == True  assert decision.topic_category == "books"  assert decision.urgency_level == "high"

集成測試實現

 def test_end_to_end_workflow():  workflow = create_workflow()  initial_state = {  "user_input": "Latest AI developments 2024",  "messages": [],  "session_id": "test_session"  }  result = workflow.invoke(initial_state)  assert result["needs_web_search"] == True  assert len(result["search_results"]) > 0

部署配置與運維

環境配置

 # 安裝必要依賴  
pip install langgraph langchain-ollama flask reportlab beautifulsoup4# 啟動Ollama服務器  
ollama serve  
ollama pull llama3.2:latest# 運行應用程序  python web_chatbot_with_pdf.py

生產環境清單

生產環境部署需要考慮以下關鍵要素：建立完善的日志記錄和監控體系；實施搜索API的速率限制機制；配置身份驗證和會話管理系統；啟用HTTPS和安全標頭配置；設置數據庫以持久化對話記錄；建立備份和恢復程序。