探索Streamlit在測試領域的高效應用：文檔讀取與大模型用例生成的完美前奏

大模型用例生成前置工作之文檔讀取——構建你的自動化測試基礎

在群友的極力推薦下，開始了streamlit的學習之旅。本文將介紹如何使用Streamlit開發一個多功能文檔處理工具，支持讀取、預覽、格式轉換和導出多種測試相關文檔（YAML、JSON、DOCX、PDF、Excel、Markdown），并提供Markdown預覽選項，為大模型測試用例生成奠定基礎。源碼以上傳云盤，歡迎下載試用，云盤地址見文章底部截圖。

功能概述

支持的文檔格式：
- YAML、JSON、DOCX、PDF、Excel、Markdown
核心功能：
- 文件上傳與預覽
- 轉換為Markdown格式預覽
- 格式轉換（Excel ? JSON/Markdown，JSON ? Excel/Markdown）
- 文件導出為轉換后的格式
測試用例生成預留接口：
- 為后續集成大模型生成測試用例提供數據準備支持

依賴項安裝

在開始前，請確保安裝以下Python庫：

pip install streamlit pandas openpyxl python-docx pdfminer.six PyYAML markdown2

完整代碼實現

import streamlit as st
import pandas as pd
import yaml
from docx import Document
from pdfminer.high_level import extract_text
from markdown2 import markdown
import base64
from io import BytesIO# 主標題與副標題
st.title("文檔處理工具：大模型測試用例生成的前置準備")
st.subheader("支持YAML/JSON/DOCX/PDF/Excel/Markdown的讀取、預覽與格式轉換")def file_uploader_section():"""文件上傳與基本信息顯示"""uploaded_file = st.file_uploader("上傳文件",type=["yml", "json", "docx", "pdf", "xlsx", "md"],accept_multiple_files=False)if uploaded_file:st.write(f"文件名：{uploaded_file.name}")st.write(f"文件類型：{uploaded_file.type}")st.write(f"文件大小：{uploaded_file.size/1024:.2f} KB")return uploaded_filedef file_reader(file):"""根據文件類型讀取內容并轉換為字符串"""content = ""file_type = file.name.split('.')[-1]if file_type == 'yml':content = yaml.safe_load(file)content = yaml.dump(content)  # 轉為字符串elif file_type == 'json':df = pd.read_json(file)content = df.to_string(index=False)elif file_type == 'docx':doc = Document(file)paragraphs = [para.text for para in doc.paragraphs]content = '\n'.join(paragraphs)elif file_type == 'pdf':content = extract_text(file)elif file_type == 'xlsx':df = pd.read_excel(file)content = df.to_string(index=False)elif file_type == 'md':content = file.read().decode()return contentdef format_converter(content, convert_to_md):"""將文本轉換為Markdown格式"""if convert_to_md:return markdown(content)else:return contentdef file_exporter(file, converted_data, export_format):"""生成文件導出鏈接"""buffer = BytesIO()if export_format == "Original":file.seek(0)data = file.read()elif export_format == "JSON":if file.name.endswith('.xlsx'):df = pd.read_excel(file)data = df.to_json(orient='records').encode()else:st.error("僅Excel支持導出為JSON")return Noneelif export_format == "Markdown":if isinstance(converted_data, str):data = converted_data.encode()else:data = converted_data.to_markdown().encode()elif export_format == "Excel":if file.name.endswith('.json'):df = pd.read_json(file)df.to_excel(buffer, index=False)data = buffer.getvalue()else:st.error("僅JSON支持導出為Excel")return Noneelse:st.error("無效格式")return Noneb64 = base64.b64encode(data).decode()href = f'<a href="data:file/{export_format.lower()};base64,{b64}" ' \f'download="{file.name}.{export_format.lower()}">下載文件</a>'return hrefdef conversion_options(file_type):"""根據文件類型生成轉換選項"""options = ["Original"]if file_type == 'xlsx':options += ["JSON", "Markdown"]elif file_type == 'json':options += ["Excel", "Markdown"]return optionsdef main():uploaded_file = file_uploader_section()if uploaded_file:content = file_reader(uploaded_file)file_type = uploaded_file.name.split('.')[-1]# 文檔預覽with st.expander("文檔預覽"):convert_to_md = st.checkbox("轉換為Markdown格式預覽")converted_content = format_converter(content, convert_to_md)if convert_to_md:st.markdown(converted_content, unsafe_allow_html=True)else:st.text(converted_content)# 格式轉換與導出with st.expander("格式轉換與導出"):options = conversion_options(file_type)selected_format = st.selectbox("選擇導出格式", options)if selected_format != "Original":export_link = file_exporter(uploaded_file, converted_content, selected_format)if export_link:st.markdown(export_link, unsafe_allow_html=True)# 測試用例生成預留接口with st.expander("測試用例生成（預留）"):st.write("該功能需要集成NLP模型實現，當前版本暫不支持")if __name__ == "__main__":main()

代碼分塊詳解

1. 文件上傳與基本信息顯示

def file_uploader_section():uploaded_file = st.file_uploader("上傳文件",type=["yml", "json", "docx", "pdf", "xlsx", "md"],accept_multiple_files=False)if uploaded_file:st.write(f"文件名：{uploaded_file.name}")st.write(f"文件類型：{uploaded_file.type}")st.write(f"文件大小：{uploaded_file.size/1024:.2f} KB")return uploaded_file

功能：提供文件上傳入口，顯示文件名、類型和大小
關鍵點：
- st.file_uploader支持指定文件類型
- accept_multiple_files=False限制單次上傳一個文件
上傳一個EXCEL文檔

2. 文件內容讀取與解析

def file_reader(file):content = ""file_type = file.name.split('.')[-1]if file_type == 'yml':content = yaml.safe_load(file)content = yaml.dump(content)  # 轉為字符串elif file_type == 'json':df = pd.read_json(file)content = df.to_string(index=False)# 其他文件類型處理...return content

功能：根據文件類型解析內容并返回字符串
關鍵點：
- 使用pandas處理Excel/JSON的表格數據
- yaml.dump()將YAML對象轉為字符串便于后續處理
文檔預覽：

3. Markdown格式轉換

def format_converter(content, convert_to_md):if convert_to_md:return markdown(content)else:return content

功能：將文本內容轉換為Markdown格式
依賴庫：markdown2實現文本到Markdown的渲染

4. 文件導出功能

def file_exporter(file, converted_data, export_format):buffer = BytesIO()if export_format == "JSON":if file.name.endswith('.xlsx'):df = pd.read_excel(file)data = df.to_json(orient='records').encode()else:st.error("僅Excel支持導出為JSON")return Noneelif export_format == "Excel":if file.name.endswith('.json'):df = pd.read_json(file)df.to_excel(buffer, index=False)data = buffer.getvalue()else:st.error("僅JSON支持導出為Excel")return None# ...其他格式處理...b64 = base64.b64encode(data).decode()href = f'<a ...>下載文件</a>'return href

功能：生成文件導出鏈接
關鍵點：
- 使用base64編碼生成可下載的文件流
- 支持Excel ? JSON/Markdown、JSON ? Excel/Markdown的雙向轉換
Excel ? JSON：

[{"Test Name":"用例A","Status":"Pass","Execution Time":1744365600000,"Failure Reason":null,"Duration (s)":5,"Tester":"測試A","Environment":"Test","Version":"v1.0"},{"Test Name":"用例B","Status":"Fail","Execution Time":1744365900000,"Failure Reason":"請求超時","Duration (s)":3,"Tester":"測試A","Environment":"Test","Version":"v1.0"}
]

5. 格式轉換選項

def conversion_options(file_type):options = ["Original"]if file_type == 'xlsx':options += ["JSON", "Markdown"]elif file_type == 'json':options += ["Excel", "Markdown"]return options

功能：根據文件類型動態生成轉換選項
支持的轉換：
- Excel → JSON/Markdown
- JSON → Excel/Markdown

6. 主函數邏輯

def main():uploaded_file = file_uploader_section()if uploaded_file:content = file_reader(uploaded_file)file_type = uploaded_file.name.split('.')[-1]# 預覽部分with st.expander("文檔預覽"):convert_to_md = st.checkbox("轉換為Markdown格式預覽")converted_content = format_converter(content, convert_to_md)# 顯示預覽內容# 轉換與導出部分with st.expander("格式轉換與導出"):options = conversion_options(file_type)selected_format = st.selectbox("選擇導出格式", options)if selected_format != "Original":export_link = file_exporter(...)st.markdown(export_link, unsafe_allow_html=True)# 測試用例生成預留with st.expander("測試用例生成（預留）"):st.write("該功能需要集成NLP模型實現，當前版本暫不支持")