PDF轉JPG（并去除多余的白邊）

首先，手動下載一個軟件（poppler for Windows），下載地址：https://github.com/oschwartz10612/poppler-windows/releases/tag/v24.08.0-0

否則會出現以下錯誤：

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

下載完了之后解壓并配置下環境變量并確認是否安裝成功：

比如：解壓到C:\software\poppler-24.08.0；添加系統環境變量：C:\software\poppler-24.08.0\Library\bin；打開cmd命令行驗證：輸入pdfinfo -v

完整代碼：

#!/user/bin/env?python3
#?-*-?coding:?utf-8?-*-
import os
from pdf2image import convert_from_path
from PIL import Image
import numpy as npdef pdf_to_jpg(folder_path, output_path):for root, dirs, files in os.walk(folder_path):# 創建輸出目錄if not os.path.exists(output_path):os.makedirs(output_path)for file in files:if not file.endswith(".pdf"):continueif len(dirs) < 1:images = convert_from_path(os.path.join(root, file),dpi=600,poppler_path=r'C:\software\poppler-24.08.0\Library\bin')# 將每一頁圖像保存為JPEG文件for i, image in enumerate(images):# 還可以指定寬度或高度，調整圖像大小# if width or height:#     image = image.resize((width, height))gray_image = image.convert("L")  # 將圖片轉為8位灰度圖，“L”表示luminancegray_array = np.array(gray_image)threshold = 240mask = gray_array < threshold  # 用閾值來獲取圖片中非白色部分coords = np.column_stack(np.where(mask))y0, x0 = coords.min(axis=0)  # 獲取非白色區域的坐標y1, x1 = coords.max(axis=0)cropped_image = image.crop((x0, y0, x1+1, y1+1))jpg_file = os.path.join(output_path, f"{file.split('.')[0]}.jpg")cropped_image.save(jpg_file, 'JPEG')print(f'Saved {output_path}')else:for d in dirs:images = convert_from_path(os.path.join(root, d))# 創建輸出目錄output_path_d = os.path.join(output_path, d)if not os.path.exists(output_path_d):os.makedirs(output_path_d)# 將每一頁圖像保存為JPEG文件for i, image in enumerate(images):# 還可以指定寬度或高度，調整圖像大小# if width or height:#     image = image.resize((width, height))jpg_file = os.path.join(output_path_d, f"{file.split('.')[0]}.jpg")image.save(jpg_file, 'JPEG')print(f'Saved {output_path_d}')if __name__ == '__main__':# PDF文件路徑pdf_path = r'C:\datasets\D94_pdf'pdf_to_jpg(pdf_path, r'C:\datasets\D94_jpg')# 轉換為圖像

參考鏈接：

python 去除圖片白邊_mob649e8167c4a3的技術博客_51CTO博客

Python學習筆記：PDF轉圖片 - Hider1214 - 博客園

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/72845.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/72845.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/72845.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！