【GenBI實戰】python腳本實現基于DeepSeek api的數據查詢和圖表可視化

寫在前面

生成式 BI (GenBI) 正在改變我們與數據交互的方式。它允許用戶使用自然語言提出問題，并自動獲得數據洞察，而無需編寫復雜的 SQL 查詢或手動創建圖表。本文將帶你動手實戰，使用 Python 和 DeepSeek API (或其他類似的大語言模型 API) 實現一個簡單的 GenBI 流程：

輸入： Markdown 表格形式的數據和自然語言查詢。
處理： 利用 DeepSeek API 的強大語言理解和代碼生成能力，將自然語言查詢轉換為 Python 代碼，并使用該代碼處理 Markdown 表格數據。
輸出： 生成可視化圖表（使用 Plotly 庫）。

通過這個實戰項目，你將學習如何：

使用 DeepSeek API (或其他 LLM API)。
解析 Markdown 表格數據。
將自然語言查詢轉換為 Python 代碼。
使用 Python 代碼處理數據并生成可視化圖表 (使用 Plotly)。

前提條件

Python 3.7 或更高版本
安裝必要的 Python 庫：
```
pip install requests pandas plotly
```
如果沒有DeepSeek的API_KEY, 可以替換成其他LLM，比如OpenAI, 文心一言, 通義千問, 智譜AI等。需要稍微修改對應的API調用代碼。
DeepSeek API 密鑰 (或其他 LLM API 密鑰)

項目結構

genbi_project/
├── main.py        # 主程序
├── data.md        # 示例 Markdown 數據文件 (可選)
└── requirements.txt # 依賴庫列表

步驟 1：設置 DeepSeek API 密鑰 (或其他 LLM API 密鑰)
首先，需要獲取DeepSeek API的密鑰。訪問DeepSeek官網注冊賬號，然后在控制臺中找到你的API key。
如果你沒有DeepSeek的API key，可以使用其他大語言模型API, 例如:

OpenAI: 訪問 https://platform.openai.com/ 注冊并獲取 API 密鑰。
文心一言 (ERNIE Bot): 訪問 https://yiyan.baidu.com/ 注冊并獲取 API 密鑰。
通義千問: 訪問https://tongyi.aliyun.com/注冊并獲取API key.
智譜AI: 訪問https://open.bigmodel.cn/注冊并獲取API key.

將獲取到的 API 密鑰設置為環境變量或直接在代碼中設置（不推薦在代碼中明文存儲密鑰）：

# main.py (示例 - 使用環境變量)
import os# 從環境變量獲取 API 密鑰 (推薦)
DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY")
# 或者直接設置 API 密鑰 (不推薦)
# DEEPSEEK_API_KEY = "YOUR_DEEPSEEK_API_KEY"

替換其他LLM的API KEY和調用方式
如果使用其他LLM，需要做以下幾處修改：

API 密鑰： 將 DEEPSEEK_API_KEY 替換為你所用 LLM 的 API 密鑰。
API 調用： 將 call_deepseek_api 函數替換為調用你所用 LLM API 的函數。不同的 LLM API 有不同的調用方式（請求 URL、參數、返回結果格式等），需要參考其官方文檔進行修改。
Prompt： Prompt 的內容可能需要根據不同 LLM 的特點進行微調，以獲得最佳效果。

步驟 2：定義 DeepSeek API 調用函數

創建一個函數來調用 DeepSeek API：

# main.py
import requestsdef call_deepseek_api(prompt, model="deepseek-coder", max_tokens=1000, temperature=0.5):"""調用 DeepSeek API。Args:prompt: 輸入給模型的 Prompt。model: 使用的模型名稱。max_tokens: 生成的最大 token 數。temperature: 控制生成結果的隨機性。Returns:模型生成的文本，如果發生錯誤則返回 None。"""if DEEPSEEK_API_KEY is None:raise ValueError("DeepSeek API key not found.  Set the DEEPSEEK_API_KEY environment variable.")url = "https://api.deepseek.com/v1/chat/completions"  # 請根據 DeepSeek API 文檔修改 URLheaders = {"Content-Type": "application/json","Authorization": f"Bearer {DEEPSEEK_API_KEY}",}data = {"model": model,"messages": [{"role": "user", "content": prompt}],"max_tokens": max_tokens,"temperature": temperature,}try:response = requests.post(url, headers=headers, json=data)response.raise_for_status()  # 如果請求失敗，拋出異常return response.json()["choices"][0]["message"]["content"]except requests.exceptions.RequestException as e:print(f"Error calling DeepSeek API: {e}")return None

步驟 3：解析 Markdown 表格數據

創建一個函數來解析 Markdown 表格數據，并將其轉換為 Pandas DataFrame：

# main.py
import pandas as pd
import redef parse_markdown_table(markdown_table):"""解析 Markdown 表格數據，并將其轉換為 Pandas DataFrame。Args:markdown_table: Markdown 表格字符串。Returns:Pandas DataFrame，如果解析失敗則返回 None。"""try:# 使用正則表達式分割行lines = markdown_table.strip().split('\n')header = [s.strip() for s in re.split(r"\|", lines[0])[1:-1]]# 移除表頭下的分隔線lines = lines[2:]data = []for line in lines:# 使用正則表達式分割單元格, 考慮 | 前后可能有空格row = [s.strip() for s in re.split(r"\s*\|\s*", line)[1:-1]]data.append(row)df = pd.DataFrame(data, columns=header)return dfexcept Exception as e:print(f"Error parsing Markdown table: {e}")return None

步驟 4：構建 Prompt 并調用 DeepSeek API

創建一個函數來構建 Prompt，調用 DeepSeek API，并獲取生成的 Python 代碼：

# main.pydef generate_python_code(markdown_table, query):"""構建 Prompt，調用 DeepSeek API，并獲取生成的 Python 代碼。Args:markdown_table: Markdown 表格字符串。query: 自然語言查詢。Returns:生成的 Python 代碼（字符串形式），如果生成失敗則返回 None。"""prompt = f"""
You are a helpful assistant that generates Python code to analyze data and create visualizations.
You are given a Markdown table and a natural language query.
Generate Python code (using pandas and plotly) to:
1.  Parse the Markdown table into a pandas DataFrame.
2.  Process the DataFrame to answer the query.
3.  Create a visualization (using plotly) of the result.
4.  Print the figure in JSON format using `fig.to_json()`. Do *not* use `fig.show()`.Markdown Table:
```markdown
{markdown_table}

Natural Language Query:
{query}

Python Code:

"""code = call_deepseek_api(prompt)return code

步驟 5：執行生成的 Python 代碼并獲取可視化結果

創建一個函數來執行生成的 Python 代碼，并獲取 Plotly 圖表的 JSON 表示：

# main.py
import jsondef execute_code_and_get_visualization(code):"""執行生成的 Python 代碼，并獲取 Plotly 圖表的 JSON 表示。Args:code: 要執行的 Python 代碼。Returns:Plotly 圖表的 JSON 表示（字符串形式），如果執行失敗則返回 None。"""try:# 創建一個局部命名空間，用于執行代碼local_vars = {}exec(code, {}, local_vars)# 檢查是否有 'fig' 變量 (Plotly 圖表對象)if 'fig' in local_vars:fig = local_vars['fig']# 將 Plotly 圖表轉換為 JSON 格式fig_json = fig.to_json()return fig_jsonelse:print("Error: No 'fig' variable found in the generated code.")return Noneexcept Exception as e:print(f"Error executing generated code: {e}")return None

步驟 6：主程序邏輯

# main.py
def main():"""主程序邏輯。"""# 示例 Markdown 表格數據markdown_table = """
| Region | Sales | Profit |
|---|---|---|
| North America | 1200000 | 240000 |
| Europe | 950000 | 190000 |
| Asia | 800000 | 160000 |
| South America | 500000 | 80000 |
| Africa | 300000 | 45000 |
"""# 示例自然語言查詢query = "Show a bar chart of sales by region, sorted in descending order."# 生成 Python 代碼code = generate_python_code(markdown_table, query)if code:print("Generated Python code:\n", code)# 執行代碼并獲取可視化結果visualization_json = execute_code_and_get_visualization(code)if visualization_json:print("\nVisualization (JSON format):\n", visualization_json)# (可選) 將 JSON 數據保存到文件with open("visualization.json", "w") as f:f.write(visualization_json)print("\nVisualization saved to visualization.json")# (可選) 在瀏覽器中顯示圖表 (需要額外的 JavaScript 代碼)else:print("Failed to generate visualization.")else:print("Failed to generate Python code.")
if __name__ == "__main__":main()

完整的 main.py 代碼：

import os
import requests
import pandas as pd
import re
import json
import plotly.express as px# 從環境變量獲取 API 密鑰 (推薦)
DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY")
# 或者直接設置 API 密鑰 (不推薦)
# DEEPSEEK_API_KEY = "YOUR_DEEPSEEK_API_KEY"def call_deepseek_api(prompt, model="deepseek-coder", max_tokens=1000, temperature=0.5):"""調用 DeepSeek API。Args:prompt: 輸入給模型的 Prompt。model: 使用的模型名稱。max_tokens: 生成的最大 token 數。temperature: 控制生成結果的隨機性。Returns:模型生成的文本，如果發生錯誤則返回 None。"""if DEEPSEEK_API_KEY is None:raise ValueError("DeepSeek API key not found.  Set the DEEPSEEK_API_KEY environment variable.")url = "https://api.deepseek.com/v1/chat/completions"  # 請根據 DeepSeek API 文檔修改 URLheaders = {"Content-Type": "application/json","Authorization": f"Bearer {DEEPSEEK_API_KEY}",}data = {"model": model,"messages": [{"role": "user", "content": prompt}],"max_tokens": max_tokens,"temperature": temperature,}try:response = requests.post(url, headers=headers, json=data)response.raise_for_status()  # 如果請求失敗，拋出異常return response.json()["choices"][0]["message"]["content"]except requests.exceptions.RequestException as e:print(f"Error calling DeepSeek API: {e}")return None
def parse_markdown_table(markdown_table):"""解析 Markdown 表格數據，并將其轉換為 Pandas DataFrame。Args:markdown_table: Markdown 表格字符串。Returns:Pandas DataFrame，如果解析失敗則返回 None。"""try:# 使用正則表達式分割行lines = markdown_table.strip().split('\n')header = [s.strip() for s in re.split(r"\|", lines[0])[1:-1]]# 移除表頭下的分隔線lines = lines[2:]data = []for line in lines:# 使用正則表達式分割單元格, 考慮 | 前后可能有空格row = [s.strip() for s in re.split(r"\s*\|\s*", line)[1:-1]]data.append(row)df = pd.DataFrame(data, columns=header)return dfexcept Exception as e:print(f"Error parsing Markdown table: {e}")return None
def generate_python_code(markdown_table, query):"""構建 Prompt，調用 DeepSeek API，并獲取生成的 Python 代碼。Args:markdown_table: Markdown 表格字符串。query: 自然語言查詢。Returns:生成的 Python 代碼（字符串形式），如果生成失敗則返回 None。"""prompt = f"""
You are a helpful assistant that generates Python code to analyze data and create visualizations.
You are given a Markdown table and a natural language query.
Generate Python code (using pandas and plotly) to:
1.  Parse the Markdown table into a pandas DataFrame.
2.  Process the DataFrame to answer the query.
3.  Create a visualization (using plotly) of the result.
4.  Print the figure in JSON format using `fig.to_json()`. Do *not* use `fig.show()`.Markdown Table:
```markdown
{markdown_table}

Natural Language Query:
{query}

Python Code:

"""code = call_deepseek_api(prompt)return code
import jsondef execute_code_and_get_visualization(code):"""執行生成的 Python 代碼，并獲取 Plotly 圖表的 JSON 表示。Args:code: 要執行的 Python 代碼。Returns:Plotly 圖表的 JSON 表示（字符串形式），如果執行失敗則返回 None。"""try:# 創建一個局部命名空間，用于執行代碼local_vars = {}exec(code, {}, local_vars)# 檢查是否有 'fig' 變量 (Plotly 圖表對象)if 'fig' in local_vars:fig = local_vars['fig']# 將 Plotly 圖表轉換為 JSON 格式fig_json = fig.to_json()return fig_jsonelse:print("Error: No 'fig' variable found in the generated code.")return Noneexcept Exception as e:print(f"Error executing generated code: {e}")return None
def main():"""主程序邏輯。"""# 示例 Markdown 表格數據markdown_table = """
| Region | Sales | Profit |
|---|---|---|
| North America | 1200000 | 240000 |
| Europe | 950000 | 190000 |
| Asia | 800000 | 160000 |
| South America | 500000 | 80000 |
| Africa | 300000 | 45000 |
"""# 示例自然語言查詢query = "Show a bar chart of sales by region, sorted in descending order."# 生成 Python 代碼code = generate_python_code(markdown_table, query)if code:print("Generated Python code:\n", code)# 執行代碼并獲取可視化結果visualization_json = execute_code_and_get_visualization(code)if visualization_json:print("\nVisualization (JSON format):\n", visualization_json)# (可選) 將 JSON 數據保存到文件with open("visualization.json", "w") as f:f.write(visualization_json)print("\nVisualization saved to visualization.json")# (可選) 在瀏覽器中顯示圖表 (需要額外的 JavaScript 代碼)else:print("Failed to generate visualization.")else:print("Failed to generate Python code.")
if __name__ == "__main__":main()

運行結果
成功運行后，控制臺會輸出生成的Python代碼和圖表的JSON格式數據。
程序會在目錄下創建一個visualization.json文件。
DeepSeek API (或其他 LLM API) 生成的 Python 代碼可能如下所示（實際生成的代碼可能略有不同）：

import pandas as pd
import plotly.express as px
import re# 解析 Markdown 表格 (與 parse_markdown_table 函數相同)
def parse_markdown_table(markdown_table):try:lines = markdown_table.strip().split('\n')header = [s.strip() for s in re.split(r"\|", lines[0])[1:-1]]lines = lines[2:]data = []for line in lines:row = [s.strip() for s in re.split(r"\s*\|\s*", line)[1:-1]]data.append(row)df = pd.DataFrame(data, columns=header)return dfexcept Exception as e:print(f"Error parsing Markdown table: {e}")return None# 將 Markdown 表格解析為 DataFrame
df = parse_markdown_table(markdown_table)# 將 'Sales' 列轉換為數值類型
df['Sales'] = pd.to_numeric(df['Sales'])# 按 'Sales' 列降序排序
df_sorted = df.sort_values('Sales', ascending=False)# 創建柱狀圖
fig = px.bar(df_sorted, x='Region', y='Sales', title='Sales by Region (Sorted)')# 將圖表轉換為 JSON 格式 (重要：不要使用 fig.show())
# print(fig.to_json()) # 這行被注釋掉，因為我們已經在 execute_code_and_get_visualization 函數中處理了