微軟GraphRAG 端到端使用及自用工具類

文章目錄

- 一. 環境準備
- - 1.安裝 Python 環境
  - 2.安裝依賴
  - 3.配置 LLM API Key
- 二. 初始化項目
- 三. 文檔上傳 & 索引構建
- 四. 問答（CLI 方式）
- - 示例：
- 五. 代碼中調用 GraphRAG
- - 工具概覽
  - 核心工具詳解
  - 1. simple_graphrag_integration.py - 智能問答核心
  - 2. view_graph.py - 基礎數據查看
  - 3. visualize_graph.py - 圖譜可視化
  - 4. export_to_neo4j.py - Neo4j數據庫導入
  - 5. export_to_csv_for_neo4j.py - CSV批量導出
  - 6. parquet_viewer.py - 文件查看轉換
  - 7. view_communities.py - 社區分析
- 六. Tips

之前我寫過一篇端到端構建知識圖譜的文章，有人在后臺問我關于GraphRAG的相關使用，今天我來講一下微軟的GraphRAG的使用。以及我自己寫的一些工具類，我把它們封裝成了每個都可以單獨使用的文件，有需要的可以直接拿來用倉庫鏈接。

一. 環境準備

1.安裝 Python 環境

要求 Python 3.10+（推薦 3.11）。

# 建議創建虛擬環境
python3 -m venv graphrag_env
source graphrag_env/bin/activate  # Linux/Mac
graphrag_env\Scripts\activate     # Windows

2.安裝依賴

GraphRAG 使用 poetry 管理依賴。

pip install poetry# 克隆官方倉庫
git clone https://github.com/microsoft/graphrag.git
cd graphrag# 安裝依賴
poetry install

3.配置 LLM API Key

GraphRAG 默認支持 Azure OpenAI 和 OpenAI。

新建 .env 文件（或在環境變量里設置）：

OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_API_MODEL=gpt-4o-mini   # 或 gpt-4o, gpt-5 等
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

如果用 Azure OpenAI，替換為：

AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT=gpt-4o
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-3-small

二. 初始化項目

在工作目錄（例如 ~/my_graphrag）執行：

graphrag init --root ./my_graphrag

它會生成一個 settings.yaml 配置文件和數據目錄結構：

my_graphrag/├── settings.yaml├── input/         # 放文檔├── output/        # 索引和結果└── state/         # 中間狀態

如果沒有對應的目錄結構手動創建 mkdir -p my_graphrag/input

三. 文檔上傳 & 索引構建

把你要問答的文檔（txt/pdf/markdown 等）放到 input/ 目錄。

例如：

input/└── demo.txt

然后執行索引構建：

  graphrag index --root ./my_graphrag

GraphRAG 會自動完成：

文檔切分（text units）
實體 & 關系抽取
構建知識圖譜
社區檢測 & 摘要生成
向量存儲

最終結果會放在 output/ 目錄。

在這里插入圖片描述

四. 問答（CLI 方式）

GraphRAG 支持 local / global / drift 三種查詢模式。

示例：

# 本地檢索（local）
graphrag query --root ./my_graphrag --method local -q "請總結一下報告中公司的主要業務？"# 全局檢索（global）
graphrag query --root ./my_graphrag --method global -q "公司和競爭對手的差異是什么？"# 漂移搜索（drift）
graphrag query --root ./my_graphrag --method drift -q "哪些部門與AI相關？"

結果會直接輸出在終端

五. 代碼中調用 GraphRAG

環境搭建好以后，就可以在我們的項目中根據需要來使用GraphRAG的能力，下邊我羅列了一些工具類。

工具概覽

工具名稱	主要功能	適用場景
`simple_graphrag_integration.py`	核心查詢服務	在應用中集成 GraphRAG 問答功能
`view_graph.py`	基礎數據查看	快速了解知識圖譜的基本統計信息
`visualize_graph.py`	圖譜可視化	生成知識圖譜的網絡圖像
`view_communities.py`	社區分析	查看和分析圖譜中的社區結構
`parquet_viewer.py`	文件查看器	查看和轉換 GraphRAG 輸出文件
`export_to_neo4j.py`	Neo4j 集成	將知識圖譜導入到 Neo4j 數據庫
`export_to_csv_for_neo4j.py`	CSV 導出	生成 Neo4j 兼容的批量導入文件

核心工具詳解

1. simple_graphrag_integration.py - 智能問答核心

主要類：

# 基礎服務
service = SimpleGraphRAGService("./graphrag/my_graphrag")
result = service.query("什么是RAG？", method="auto")# 聊天機器人
chatbot = SimpleGraphRAGChatBot("./graphrag/my_graphrag")
response = chatbot.chat("介紹一下這個知識庫")

查詢方法：

local - 基于相關實體回答具體問題
global - 基于社區報告回答概括性問題
auto - 自動選擇最適合的方法

核心功能：

自然語言問答
上下文智能構建
對話歷史管理
多種查詢策略

2. view_graph.py - 基礎數據查看

def view_entities():df = pd.read_parquet("./graphragmy_graphrag/output/entities.parquet")print(f"總實體數: {len(df)}")print(df[['title', 'type', 'description']].head())def view_relationships():df = pd.read_parquet("./graphragmy_graphrag/output/relationships.parquet")print(f"總關系數: {len(df)}")print(df[['source', 'target', 'description']].head())

輸出示例：

=== 實體信息 ===
總實體數: 68title    type                description
0                 RAG  CONCEPT  檢索增強生成技術...
1                 LLM  CONCEPT  大語言模型...

3. visualize_graph.py - 圖譜可視化

def create_network_graph():"""創建網絡圖可視化"""# 讀取數據entities_df = pd.read_parquet("my_graphrag/output/entities.parquet")relationships_df = pd.read_parquet("my_graphrag/output/relationships.parquet")# 創建圖G = nx.Graph()for _, entity in entities_df.iterrows():G.add_node(entity['title'])for _, rel in relationships_df.iterrows():G.add_edge(rel['source'], rel['target'])# 可視化plt.figure(figsize=(15, 10))pos = nx.spring_layout(G, k=3, iterations=50)nx.draw(G, pos, with_labels=True, node_color='lightblue')plt.savefig('knowledge_graph.png', dpi=300)

生成文件： knowledge_graph.png - 高分辨率知識圖譜圖像

4. export_to_neo4j.py - Neo4j數據庫導入

class Neo4jExporter:def __init__(self, uri, user, password):self.driver = GraphDatabase.driver(uri, auth=(user, password))def create_entities(self, entities_df):"""創建實體節點"""with self.driver.session() as session:for _, entity in entities_df.iterrows():query = """CREATE (e:Entity {title: $title,type: $type,description: $description})"""session.run(query, **entity.to_dict())def create_relationships(self, relationships_df):"""創建關系"""with self.driver.session() as session:for _, rel in relationships_df.iterrows():query = """MATCH (a:Entity {title: $source})MATCH (b:Entity {title: $target})CREATE (a)-[r:RELATED_TO {description: $description}]->(b)"""session.run(query, **rel.to_dict())

常用Cypher查詢：

-- 查找最重要的實體
MATCH (n:Entity)
RETURN n.title, COUNT { (n)--() } as connections
ORDER BY connections DESC LIMIT 10-- 查找兩個實體間的最短路徑
MATCH path = shortestPath((a:Entity {title: 'RAG'})-[*]-(b:Entity {title: 'LLM'}))
RETURN path

5. export_to_csv_for_neo4j.py - CSV批量導出

def export_for_neo4j_csv():"""導出Neo4j兼容的CSV文件"""# 讀取數據entities_df = pd.read_parquet("my_graphrag/output/entities.parquet")relationships_df = pd.read_parquet("my_graphrag/output/relationships.parquet")# 準備節點CSVnodes_df = entities_df[['title', 'type', 'description']].copy()nodes_df.columns = ['title:ID', 'type', 'description']nodes_df[':LABEL'] = 'Entity'# 準備關系CSVrels_df = relationships_df[['source', 'target', 'description']].copy()rels_df.columns = [':START_ID', ':END_ID', 'description']rels_df[':TYPE'] = 'RELATED_TO'# 保存文件nodes_df.to_csv('neo4j_nodes.csv', index=False)rels_df.to_csv('neo4j_relationships.csv', index=False)

Neo4j導入命令：

neo4j-admin database import full \--nodes=neo4j_nodes.csv \--relationships=neo4j_relationships.csv \neo4j

6. parquet_viewer.py - 文件查看轉換

def view_parquet_info(file_path):"""查看parquet文件信息"""df = pd.read_parquet(file_path)print(f"文件: {file_path}")print(f"行數: {len(df)}, 列數: {len(df.columns)}")print(f"列信息: {list(df.columns)}")print(df.head(3))def convert_to_csv(parquet_path):"""轉換為CSV格式"""df = pd.read_parquet(parquet_path)csv_path = parquet_path.replace('.parquet', '.csv')df.to_csv(csv_path, index=False, encoding='utf-8')return csv_path

7. view_communities.py - 社區分析

def view_community_reports():"""查看社區報告"""df = pd.read_parquet("my_graphrag/output/community_reports.parquet")print(f"社區總數: {len(df)}")for i, row in df.iterrows():print(f"\n--- 社區 {i} ---")print(f"標題: {row.get('title', 'N/A')}")print(f"層級: {row.get('level', 'N/A')}")content = str(row.get('full_content', ''))[:200] + '...'print(f"內容摘要: {content}")

這套工具集提供了完整的GraphRAG數據處理和應用集成解決方案，從基礎查看到高級分析，滿足不同層次的使用需求。

六. Tips

選擇方法
- local → 適合精準事實類問答（小范圍）
- global → 適合總結、綜合類問題（大范圍）
- drift → 資源有限時的折中選擇
Prompt 調整
修改 settings.yaml 里的 prompts 部分，能優化答案風格。
文檔更新
如果增加/修改文檔，只需再次運行：
```
graphrag index --root ./my_graphrag
```