Python批量替換Excel和Word中的關鍵字

一、問題的提出

有時，我們手頭上有多個Excel或者Word文件，但是領導突然要求對某幾個術語進行批量的修改，你是不是有要崩潰的感覺。因為這么多文件，要一個一個地打開文件，再進行批量替換修改，幾個文件還好，如果是成百上千的文件，我想你一會兒就感覺自己被搞暈了，不僅搞不清修改了沒有修改完，而且已經修改的也不知道修改的徹底不。

于是，問題來了，當我需要對多個Excel和Word文件中的關鍵字進行替換，而且不改變原文件的格式，同時刪除源文件，我們該怎么辦？這些office文件可能分布在不同的文件夾下，所以替換后還要存放在原來的文件夾。同時，我們編寫的程序還要在Windows和MacOS環境下都可以使用。

二、算法分析

由于要在多個環境下使用，我們放棄VBA，考慮采用Python編程的方法來解決。

1. 第一步?讀取一個替換關鍵字的"批量替換表.xlsx"生成一個字典，這樣是為了后面可以批量替換。第二步遍歷當前目錄下所有目錄包括上當的文件，主要是docx和xlsx文件，如果是doc和xls文件，還要考慮兩這兩種格式的文件進行批量的轉化，見下面的文章。

批量轉doc和xls為docx和xlsx文件

2. 第二步是遍歷當前所有目錄中的文件，用if條件，根據文件擴展名的不同來篩選出docx和xlsx文件。代碼如下：

    for root, filefolder, files in os.walk(os.curdir):for file in files:if file.endswith("docx"):file_path = os.path.join(root, file)for key, value in dic.items():word_replace_keywords(file_path, key, value)elif file.endswith("xlsx") and os.path.basename(file)!="批量替換表.xlsx":file_path = os.path.join(root, file)for key, value in dic.items():excel_replace_keywords(file_path, key, value)

3. 第三步是對于docx和xlsx文件分別進行替換處理，主要采用了python-docx和openpyxls這兩個模塊來進行替換。針對docx文件，我們用Document()來讀取，用以下代碼來替換：

def info_update(doc, old, new):for para in doc.paragraphs:for run in para.runs:if old in run.text:run.text = run.text.replace(old, new)

對于xlsx文件我，我們通過下面的代碼實現關鍵字替換，同時不改變原來關鍵字的格式。

def replace_cell_text_with_format(cell, keyword, replacement):paragraphs = cell.paragraphsfor paragraph in paragraphs:for run in paragraph.runs:if keyword in run.text:new_text = run.text.replace(keyword, replacement)run.clear()  # 清除當前文本new_run = run._element  # 創建新的runnew_run.text = new_text  # 設置新文本for key in run._r.attrib.keys():  # 復制格式屬性if key != 't':new_run.attrib[key] = run._r.attrib[key]

4. 第四步?我們要保存替換后的文件，同時用os.remove()刪除原來的文件。

三、代碼展示

最終，我們編制出70多行的代碼，一鍵實現了多文件、多關鍵字、保存源格式，又能在Windows和蘋果電腦環境使用的程序。代碼如下：

import os
from docx import Document
from openpyxl import load_workbookdef info_update(doc, old, new):for para in doc.paragraphs:for run in para.runs:if old in run.text:run.text = run.text.replace(old, new)def replace_cell_text_with_format(cell, keyword, replacement):paragraphs = cell.paragraphsfor paragraph in paragraphs:for run in paragraph.runs:if keyword in run.text:new_text = run.text.replace(keyword, replacement)run.clear()  # 清除當前文本new_run = run._element  # 創建新的runnew_run.text = new_text  # 設置新文本for key in run._r.attrib.keys():  # 復制格式屬性if key != 't':new_run.attrib[key] = run._r.attrib[key]
def get_dic():workbook = load_workbook('批量替換表.xlsx')sht = workbook.activedic = {}for c1,c2 in zip(sht["A"],sht["B"]):if c1.value!= None and c2.value!= None:dic[c1.value] = c2.valuereturn dicdef word_replace_keywords(file_path, keyword, replacement):doc = Document(file_path)info_update(doc, keyword, replacement)try: for table in doc.tables:if not any(cell.text for row in table.rows for cell in row.cells):continue  for row in table.rows:for cell in row.cells:if keyword in cell.text:replace_cell_text_with_format(cell, keyword, replacement)except Exception as e:print("Error processing table:", e)doc.save(file_path)def excel_replace_keywords(file_path, keyword, replacement):wb = load_workbook(file_path)for sheet_name in wb.sheetnames:sheet = wb[sheet_name]for row in sheet.iter_rows():for cell in row:if cell.value and keyword in str(cell.value):cell.value = str(cell.value).replace(keyword, replacement)wb.save(file_path)wb.close()def get_replaced(dic):    for root, filefolder, files in os.walk(os.curdir):for file in files:if file.endswith("docx"):file_path = os.path.join(root, file)for key, value in dic.items():word_replace_keywords(file_path, key, value)elif file.endswith("xlsx") and os.path.basename(file)!="批量替換表.xlsx":file_path = os.path.join(root, file)for key, value in dic.items():excel_replace_keywords(file_path, key, value)
def main():dic = get_dic()get_replaced(dic)
if __name__ == "__main__":main()

以上代碼的優勢在于：速度快，設置好關鍵字后一鍵替換，可以在多個環境下使用，相比VBA代碼，Python代碼的執行速度更快、操作更簡單、省時省力。

四、注意事項

1. 運行代碼前一定要安裝Python3.9及以上版本，同時安裝openpyxl和python-docx兩個模塊。

2. 執行程序前要把doc和xls文件分別轉化為docx和xlsx文件，這樣更方便替換。

3. 執行前要在程序文件目錄下建立一個xlsx文件，命名為"批量替換表.xlsx"，在表的A列放上要查找的關鍵字，B列放要替換的關鍵字。

4. 如果有問題，可以隨時與我聯系，也可以通過下面進行提問。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/40457.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/40457.shtml
英文地址，請注明出處：http://en.pswp.cn/news/40457.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！