使用 Python 自動化 Word 文檔樣式復制與內容生成

在辦公自動化領域，如何高效地處理 Word 文檔的樣式和內容復制是一個常見需求。本文將通過一個完整的代碼示例，展示如何利用 Python 的 python-docx 庫實現 Word 文檔樣式的深度復制 和 動態內容生成，并結合知識庫中的最佳實踐優化文檔處理流程。

一、為什么需要自動化 Word 文檔處理？

手動處理 Word 文檔（如復制樣式、插入表格/圖片）不僅耗時且容易出錯。Python 提供了多種庫（如 python-docx、pywin32、Spire.Doc）來自動化這些任務。例如，python-docx 可以直接操作 .docx 文件的段落、表格和樣式，而無需依賴 Microsoft Office 軟件。

二、核心功能實現：樣式與表格的深度復制

1. 表格復制（含樣式與內容）

以下函數 clone_table 實現了表格的 結構、樣式和內容 的完整復制：

def clone_table(old_table, new_doc):"""根據舊表格創建新表格"""# 創建新表格（行列數與原表一致）new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns))# 復制表格樣式（如邊框、背景色）if old_table.style:new_table.style = old_table.style# 遍歷單元格內容與樣式for i, old_row in enumerate(old_table.rows):for j, old_cell in enumerate(old_row.cells):new_cell = new_table.cell(i, j)# 清空新單元格默認段落for paragraph in new_cell.paragraphs:new_cell._element.remove(paragraph._element)# 復制段落與樣式for old_paragraph in old_cell.paragraphs:new_paragraph = new_cell.add_paragraph()for old_run in old_paragraph.runs:new_run = new_paragraph.add_run(old_run.text)copy_paragraph_style(old_run, new_run)  # 自定義樣式復制函數new_paragraph.alignment = old_paragraph.alignmentcopy_cell_borders(old_cell, new_cell)  # 復制單元格邊框# 復制列寬for i, col in enumerate(old_table.columns):if col.width is not None:new_table.columns[i].width = col.widthreturn new_table

關鍵點解析：

表格樣式保留：通過 new_table.style = old_table.style 直接繼承原表格的樣式。
單元格內容與格式分離處理：先清空新單元格的默認段落，再逐行復制文本和樣式。
邊框與列寬：通過 copy_cell_borders 和列寬設置確保視覺一致性。

2. 文檔整體樣式復制與內容生成

以下函數 clone_document 實現了從模板文檔提取樣式，并動態填充內容：

def clone_document(old_s, old_p, old_ws, new_doc_path):new_doc = Document()  # 創建新文檔# 動態填充內容for para in old_p:k, v = para["sn"], para["ct"]  # 假設 old_p 包含樣式名（sn）和內容（ct）if "image" in v:# 插入圖片（需實現 copy_inline_shapes 函數）copy_inline_shapes(new_doc, k, [i for i in old_s if v in i][0][v])elif "table" == k:# 插入表格（需實現 html_table_to_docx 函數）html_table_to_docx(new_doc, v)else:# 段落處理style = [i for i in old_s if i["style"]["sn"] == k]style_ws = [i for i in old_ws if i["style"]["sn"] == k]clone_paragraph(style[0], v, new_doc, style_ws[0])  # 克隆段落樣式new_doc.save(new_doc_path)  # 保存新文檔

數據結構說明：

old_s：模板文檔的樣式定義（如字體、段落對齊方式）。
old_p：內容數據（含樣式標簽與實際內容）。
old_ws：工作表上下文（如表格所在位置）。

三、完整流程演示

1. 依賴準備

首先安裝 python-docx：

pip install python-docx

2. 輔助函數實現

以下函數需額外實現（代碼未展示完整）：

copy_paragraph_style：復制段落樣式（如字體、顏色）。
copy_cell_borders：復制單元格邊框樣式。
get_para_style：從模板文檔提取樣式。
html_table_to_docx：將 HTML 表格轉換為 Word 表格。

3. 主程序調用

if __name__ == "__main__":# 從模板提取樣式與工作表body_ws, _ = get_para_style('demo_template.docx')body_s, body_p = get_para_style("1.docx")# 從 JSON 文件加載內容with open("1.json", "r", encoding="utf-8") as f:body_p = json.loads(f.read())# 生成新文檔clone_document(body_s, body_p, body_ws, 'cloned_example.docx')

四、實際應用場景

報告自動生成
結合模板樣式，動態填充數據庫數據生成標準化報告。
批量文檔處理
將多個 Excel 表格批量轉換為 Word 文檔（參考知識庫中的 pywin32 與 python-docx 聯合使用）。
博客內容遷移
將 Word 文檔保存為 HTML 后，按知識庫中的步驟導入 ZBlog 或 WordPress（見知識庫 [2] 和 [5]）。

五、常見問題與優化建議

1. 樣式丟失問題

原因：Word 文檔的樣式可能依賴隱式繼承。
解決方案：使用 python-docx 的 style 屬性顯式設置樣式，或參考知識庫 [7] 使用 Spire.Doc 進行更復雜的樣式處理。

2. 圖片與表格嵌入異常

原因：路徑錯誤或資源未正確加載。
解決方案：確保圖片路徑絕對化，或使用 docx.shared.Inches 顯式指定尺寸。

3. 性能優化

大文檔處理：避免頻繁調用 add_paragraph，改用批量操作。
內存管理：及時釋放 Document 對象（如 doc = None）。

六、總結

通過本文的代碼示例和解析，您已掌握如何使用 Python 實現 Word 文檔的 樣式深度復制 和 動態內容生成。結合知識庫中的其他技術（如 ZBlog 導入、Office 自動化），可進一步擴展至完整的文檔工作流自動化。

希望這篇博客能幫助您高效實現文檔自動化！如需進一步優化或功能擴展，歡迎留言討論。

from docx.enum.text import WD_BREAKfrom docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml import OxmlElement
from bs4 import BeautifulSoupfrom docx.oxml.ns import qndef docx_table_to_html(word_table):soup = BeautifulSoup(features='html.parser')html_table = soup.new_tag('table', style="border-collapse: collapse;")# 記錄哪些單元格已經被合并merged_cells = [[False for _ in range(len(word_table.columns))] for _ in range(len(word_table.rows))]for row_idx, row in enumerate(word_table.rows):html_tr = soup.new_tag('tr')col_idx = 0while col_idx < len(row.cells):cell = row.cells[col_idx]# 如果該單元格已經被合并（被前面的 colspan 或 rowspan 占用），跳過if merged_cells[row_idx][col_idx]:col_idx += 1continue# 跳過縱向合并中被“continue”的單元格v_merge = cell._element.tcPr and cell._element.tcPr.find(qn('w:vMerge'))if v_merge is not None and v_merge.get(qn('w:val')) == 'continue':col_idx += 1continuetd = soup.new_tag('td')# 設置文本內容td.string = cell.text.strip()# 初始化樣式字符串td_style = ''# 獲取單元格樣式if cell._element.tcPr:tc_pr = cell._element.tcPr# 處理背景顏色shd = tc_pr.find(qn('w:shd'))if shd is not None:bg_color = shd.get(qn('w:fill'))if bg_color:td_style += f'background-color:#{bg_color};'# 處理對齊方式jc = tc_pr.find(qn('w:jc'))if jc is not None:align = jc.get(qn('w:val'))if align == 'center':td_style += 'text-align:center;'elif align == 'right':td_style += 'text-align:right;'else:td_style += 'text-align:left;'# 處理邊框borders = tc_pr.find(qn('w:tcBorders'))if borders is not None:for border_type in ['top', 'left', 'bottom', 'right']:border = borders.find(qn(f'w:{border_type}'))if border is not None:color = border.get(qn('w:color'), '000000')size = int(border.get(qn('w:sz'), '4'))  # 半點單位，1pt = 2szstyle = border.get(qn('w:val'), 'single')td_style += f'border-{border_type}:{size // 2}px {style} #{color};'# 處理橫向合并（colspan）grid_span = tc_pr.find(qn('w:gridSpan'))if grid_span is not None:colspan = int(grid_span.get(qn('w:val'), '1'))if colspan > 1:td['colspan'] = colspan# 標記后面被合并的單元格for c in range(col_idx + 1, col_idx + colspan):if c < len(row.cells):merged_cells[row_idx][c] = True# 處理縱向合并（rowspan）v_merge = tc_pr.find(qn('w:vMerge'))if v_merge is not None and v_merge.get(qn('w:val')) != 'continue':rowspan = 1next_row_idx = row_idx + 1while next_row_idx < len(word_table.rows):next_cell = word_table.rows[next_row_idx].cells[col_idx]next_v_merge = next_cell._element.tcPr and next_cell._element.tcPr.find(qn('w:vMerge'))if next_v_merge is not None and next_v_merge.get(qn('w:val')) == 'continue':rowspan += 1next_row_idx += 1else:breakif rowspan > 1:td['rowspan'] = rowspan# 標記后面被合并的行for r in range(row_idx + 1, row_idx + rowspan):if r < len(word_table.rows):merged_cells[r][col_idx] = True# 設置樣式和默認邊距td['style'] = td_style + "padding: 5px;"html_tr.append(td)# 更新列索引if 'colspan' in td.attrs:col_idx += int(td['colspan'])else:col_idx += 1html_table.append(html_tr)soup.append(html_table)return str(soup)def set_cell_background(cell, color_hex):"""設置單元格背景色"""color_hex = color_hex.lstrip('#')shading_elm = OxmlElement('w:shd')shading_elm.set(qn('w:fill'), color_hex)cell._tc.get_or_add_tcPr().append(shading_elm)def html_table_to_docx(doc, html_content):"""將 HTML 中的表格轉換為 Word 文檔中的表格:param html_content: HTML 字符串:param doc: python-docx Document 實例"""soup = BeautifulSoup(html_content, 'html.parser')tables = soup.find_all('table')for html_table in tables:# 獲取表格行數trs = html_table.find_all('tr')rows = len(trs)# 估算最大列數（考慮 colspan）cols = 0for tr in trs:col_count = 0for cell in tr.find_all(['td', 'th']):col_count += int(cell.get('colspan', 1))cols = max(cols, col_count)# 創建 Word 表格table = doc.add_table(rows=rows, cols=cols)table.style = 'Table Grid'# 記錄已處理的單元格（用于處理合并）used_cells = [[False for _ in range(cols)] for _ in range(rows)]for row_idx, tr in enumerate(trs):cells = tr.find_all(['td', 'th'])col_idx = 0for cell in cells:while col_idx < cols and used_cells[row_idx][col_idx]:col_idx += 1if col_idx >= cols:break  # 避免越界# 獲取 colspan 和 rowspancolspan = int(cell.get('colspan', 1))rowspan = int(cell.get('rowspan', 1))# 獲取文本內容text = cell.get_text(strip=True)# 獲取對齊方式align = cell.get('align')align_map = {'left': WD_ALIGN_PARAGRAPH.LEFT,'center': WD_ALIGN_PARAGRAPH.CENTER,'right': WD_ALIGN_PARAGRAPH.RIGHT}# 獲取背景顏色style = cell.get('style', '')bg_color = Nonefor s in style.split(';'):if 'background-color' in s or 'background' in s:bg_color = s.split(':')[1].strip()break# 獲取 Word 單元格word_cell = table.cell(row_idx, col_idx)# 合并單元格if colspan > 1 or rowspan > 1:end_row = min(row_idx + rowspan - 1, rows - 1)end_col = min(col_idx + colspan - 1, cols - 1)merged_cell = table.cell(row_idx, col_idx).merge(table.cell(end_row, end_col))word_cell = merged_cell# 設置文本內容para = word_cell.paragraphs[0]para.text = text# 設置對齊方式if align in align_map:para.alignment = align_map[align]# 設置背景顏色if bg_color:try:set_cell_background(word_cell, bg_color)except:pass  # 忽略無效顏色格式# 標記已使用的單元格for r in range(row_idx, min(row_idx + rowspan, rows)):for c in range(col_idx, min(col_idx + colspan, cols)):used_cells[r][c] = True# 移動到下一個可用列col_idx += colspan# 添加空段落分隔doc.add_paragraph()return docdef copy_inline_shapes(old_paragraph):"""復制段落中的所有內嵌形狀（通常是圖片）"""images = []for shape in old_paragraph._element.xpath('.//w:drawing'):blip = shape.find('.//a:blip', namespaces={'a': 'http://schemas.openxmlformats.org/drawingml/2006/main'})if blip is not None:rId = blip.attrib['{http://schemas.openxmlformats.org/officeDocument/2006/relationships}embed']image_part = old_paragraph.part.related_parts[rId]image_bytes = image_part.image.blobimage_name=image_part.filename+";"+image_part.partnameimages.append([image_bytes,image_name, image_part.image.width, image_part.image.height])return imagesdef is_page_break(element):"""判斷元素是否為分頁符（段落或表格后）"""if element.tag.endswith('p'):for child in element:if child.tag.endswith('br') and child.get(qn('type')) == 'page':return Trueelif element.tag.endswith('tbl'):# 表格后可能有分頁符（通過下一個元素判斷）if element.getnext() is not None:next_element = element.getnext()if next_element.tag.endswith('p'):for child in next_element:if child.tag.endswith('br') and child.get(qn('type')) == 'page':return Truereturn Falsedef clone_paragraph(old_para):"""根據舊段落創建新段落"""style = {"run_style": []}if old_para.style:# 這里保存style  主要通過字體識別   是 幾級標題style_name_to_style_obj = {"sn":old_para.style.name + "_" + str(old_para.alignment).split()[0], "ct": old_para.style}style["style"] = style_name_to_style_objparas = []for old_run in old_para.runs:text_to_style_name = {"ct":old_run.text, "sn":old_para.style.name + "_" + str(old_para.alignment).split()[0]}style["run_style"].append(old_run)paras.append(text_to_style_name)style_name_to_alignment = {"sn":old_para.style.name + "_" + str(old_para.alignment).split()[0],"ct":old_para.alignment}style["alignment"] = style_name_to_alignmentimages = copy_inline_shapes(old_para)if len(images):for  image_bytes,image_name, image_width, image_height in images:style[image_name.split(";")[-1]] = imagesparas.append({"sn":image_name.split(";")[0],"ct":image_name.split(";")[-1]})return style, parasdef clone_document(old_doc_path):try:old_doc = Document(old_doc_path)new_doc = Document()# 復制主體內容elements = old_doc.element.bodypara_index = 0table_index = 0index = 0body_style = []body_paras = []while index < len(elements):element = elements[index]if element.tag.endswith('p'):old_para = old_doc.paragraphs[para_index]style, paras = clone_paragraph(old_para)body_style.append(style)body_paras += paraspara_index += 1index += 1elif element.tag.endswith('tbl'):old_table = old_doc.tables[table_index]body_paras += [{"sn":"table","ct":docx_table_to_html(old_table)}]table_index += 1index += 1elif element.tag.endswith('br') and element.get(qn('type')) == 'page':if index > 0:body_paras.append("br")new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)index += 1else:index += 1# 檢查分頁符if index < len(elements) and is_page_break(elements[index]):if index > 0:new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)body_paras.append("br")index += 1else:return body_style, body_parasexcept Exception as e:print(f"復制文檔時發生錯誤：{e}")# 使用示例
if __name__ == "__main__":# 示例HTML表格body_s, body_p = clone_document('1.docx')print()

import jsonfrom docx import Document
from docx.oxml import OxmlElement
from docx.oxml.shared import qn
from wan_neng_copy_word import clone_document as get_para_style,html_table_to_docx
import io
# 剩余部分保持不變...def copy_inline_shapes(new_doc,image_name, img):"""復制段落中的所有內嵌形狀（通常是圖片）"""new_para = new_doc.add_paragraph()for image_bytes_src,_, w, h in img:try:with open(image_name, 'rb') as f:image_bytes = f.read()except:image_bytes = image_bytes_src# 添加圖片到新段落new_para.add_run().add_picture(io.BytesIO(image_bytes), width=w, height=h)  # 設置寬度為1.25英寸或其他合適的值def copy_paragraph_style(run_from, run_to):"""復制 run 的樣式"""run_to.bold = run_from.boldrun_to.italic = run_from.italicrun_to.underline = run_from.underlinerun_to.font.size = run_from.font.sizerun_to.font.color.rgb = run_from.font.color.rgbrun_to.font.name = run_from.font.namerun_to.font.all_caps = run_from.font.all_capsrun_to.font.strike = run_from.font.strikerun_to.font.shadow = run_from.font.shadowdef is_page_break(element):"""判斷元素是否為分頁符（段落或表格后）"""if element.tag.endswith('p'):for child in element:if child.tag.endswith('br') and child.get(qn('type')) == 'page':return Trueelif element.tag.endswith('tbl'):# 表格后可能有分頁符（通過下一個元素判斷）if element.getnext() is not None:next_element = element.getnext()if next_element.tag.endswith('p'):for child in next_element:if child.tag.endswith('br') and child.get(qn('type')) == 'page':return Truereturn Falsedef clone_paragraph(para_style, text, new_doc, para_style_ws):"""根據舊段落創建新段落"""new_para = new_doc.add_paragraph()para_style_ws = para_style_ws["style"]["ct"]para_style_data = para_style["style"]["ct"]para_style_ws.font.size = para_style_data.font.sizenew_para.style = para_style_wsnew_run = new_para.add_run(text)copy_paragraph_style(para_style["run_style"][0], new_run)new_para.alignment = para_style["alignment"]["ct"]return new_paradef copy_cell_borders(old_cell, new_cell):"""復制單元格的邊框樣式"""old_tc = old_cell._tcnew_tc = new_cell._tcold_borders = old_tc.xpath('.//w:tcBorders')if old_borders:old_border = old_borders[0]new_border = OxmlElement('w:tcBorders')border_types = ['top', 'left', 'bottom', 'right', 'insideH', 'insideV']for border_type in border_types:old_element = old_border.find(f'.//w:{border_type}', namespaces={'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'})if old_element is not None:new_element = OxmlElement(f'w:{border_type}')for attr, value in old_element.attrib.items():new_element.set(attr, value)new_border.append(new_element)tc_pr = new_tc.get_or_add_tcPr()tc_pr.append(new_border)def clone_table(old_table, new_doc):"""根據舊表格創建新表格"""new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns))if old_table.style:new_table.style = old_table.stylefor i, old_row in enumerate(old_table.rows):for j, old_cell in enumerate(old_row.cells):new_cell = new_table.cell(i, j)for paragraph in new_cell.paragraphs:new_cell._element.remove(paragraph._element)for old_paragraph in old_cell.paragraphs:new_paragraph = new_cell.add_paragraph()for old_run in old_paragraph.runs:new_run = new_paragraph.add_run(old_run.text)copy_paragraph_style(old_run, new_run)new_paragraph.alignment = old_paragraph.alignmentcopy_cell_borders(old_cell, new_cell)for i, col in enumerate(old_table.columns):if col.width is not None:new_table.columns[i].width = col.widthreturn new_tabledef clone_document(old_s, old_p, old_ws, new_doc_path):new_doc = Document()# 復制主體內容for para in old_p:k, v =para["sn"],para["ct"]if "image" in v:copy_inline_shapes(new_doc,k, [i for i in old_s if v in i ][0][v])elif "table" == k:html_table_to_docx(new_doc,v)else:style = [i for i in old_s if i["style"]["sn"]==k ]style_ws = [i for i in old_ws if i["style"]["sn"]==k ]clone_paragraph(style[0], v, new_doc, style_ws[0])new_doc.save(new_doc_path)# 使用示例
if __name__ == "__main__":body_ws, _ = get_para_style('demo_template.docx')body_s, body_p = get_para_style("1.docx")# 將body_p 或者是壓縮后的內容 給llm 如果希望llm 只是參考模版樣式，可以壓縮如果需要內容或者修改不可壓縮# 而后得到json  1.json 進行word生成with open("1.json", "r", encoding="utf-8") as f:body_p=json.loads(f.read())print("獲取樣式完成",body_p)clone_document(body_s, body_p, body_ws, 'cloned_example.docx')

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH# 創建一個新的Word文檔
doc = Document()
for align in [WD_ALIGN_PARAGRAPH.LEFT, WD_ALIGN_PARAGRAPH.RIGHT, WD_ALIGN_PARAGRAPH.CENTER, None]:for blod_flag in [True, False]:# 獲取所有可用的段落樣式名（只保留段落樣式）paragraph_styles = [style for style in doc.styles if style.type == 1  # type == 1 表示段落樣式]# 輸出樣式數量print(f"共找到 {len(paragraph_styles)} 種段落樣式：")for style in paragraph_styles:print(f"- {style.name}")# 在文檔中添加每個樣式對應的段落for style in paragraph_styles:heading = doc.add_paragraph()run = heading.add_run(f"樣式名稱: {style.name}")run.bold = blod_flagpara = doc.add_paragraph(f"這是一個應用了 '{style.name}' 樣式的段落示例。", style=style)para.alignment = align# 添加分隔線（可選）doc.add_paragraph("-" * 40)# 保存為 demo_template.docx
doc.save("demo_template.docx")
print("\n? 已生成包含所有段落樣式的模板文件：demo_template.docx")