python合并多個pdf_python合并多個pdf文件

假設您有個無聊的工作，將幾十個PDF文檔合并成一個PDF文件。他們每個都有封面頁作為第一頁，但你不希望在最終結果中重復覆蓋表。即使有有很多免費的程序來組合PDF，其中許多只是合并整個文件在一起。讓我們編寫一個Python程序來自定義哪些頁面你想要的是組合PDF。從高層次來看，這是程序將要做的事情：

查找當前工作目錄中的所有PDF文件。

對文件名進行排序，以便按順序添加PDF。

將每個PDF的每個頁面(不包括第一頁)寫入輸出文件。

在實現方面，您的代碼需要執行以下操作：

調用 os.listdir() 來查找工作目錄中的所有文件，刪除所有非PDF文件。

調用Python的sort()列表方法來按字母順序排列文件名。

為輸出PDF創建PdfFileWriter對象。

遍歷每個PDF文件，為其創建PdfFileReader對象。

在每個PDF文件中循環遍歷每個頁面(第一頁除外)。

將頁面添加到輸出PDF。

將輸出PDF寫入名為allminutes.pdf的文件。

對于此項目，請打開一個新的文件編輯器窗口并將其另存為 “combinePdfs.py”

Step 1:找到所有的PDF文件

首先，您的程序需要獲取所有擴展名為.pdf的文件的列表

當前的工作目錄并對它們進行排序。讓你的代碼看起來像

以下：

在這里插入代碼片

在shebang線和關于什么的描述性評論之后程序沒有，這段代碼導入了os和PyPDF2模塊。該

os.listdir(’.’) 調用將返回當前工作中的每個文件的列表目錄。代碼循環遍歷此列表，并僅添加帶有.pdf擴展的那些文件pdfFiles。之后，此列表按字母順序排序，使用key = str.lower關鍵字參數對sort() 進行排序。創建PdfFileWriter對象以保存組合的PDF頁面。最后，一些評論概述了該計劃的其余部分。

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# TODO: Loop through all the PDF files.

# TODO: Loop through all the pages (except the first) and add them.

# TODO: Save the resulting PDF to a file.

第二步：打開每一個 PDF 文件

現在程序必須讀取pdfFiles中的每個PDF文件。添加以下內容：

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

# TODO: Loop through all the pages (except the first) and add them.

# TODO: Save the resulting PDF to a file.

對于每個PDF，循環通過以讀二進制模式(以’rb’作為第二個參數)調用open() 。 open()調用返回一個File對象，它被傳遞給PyPDF2.PdfFileReader() 。

第三步: 添加每一頁

對于每個PDF，您都希望遍歷除第一個頁面之外的每個頁面。加上這個代碼到你的程序：

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

# Loop through all the pages (except the first) and add them.

for pageNum in range(1, pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

# TODO: Save the resulting PDF to a file.

for循環中的代碼將每個Page對象分別復制到PdfFileWriter對象。請記住，您想跳過第一頁。以來

PyPDF2認為0是第一頁，你的循環應該從1 開始，然后轉到但不包括pdfReader.numPages中的整數。

第四步: 保存結果

在這些嵌套的for循環完成循環之后，pdfWriter變量將會循環包含PdfFileWriter對象，其中包含所有PDF的頁面。最后一步是將此內容寫入硬盤驅動器上的文件。將此代碼添加到你程序中：

#!/usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('/home/hux/books/python'):

if filename.endswith('.pdf'):

pdfFiles.append('/home/hux/books/python/'+filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj, strict=False)

for pageNum in range(1, pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

pdfOutput = open('allminutes.pdf', 'wb')

pdfWriter.write(pdfOutput)

pdfOutput.close()

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/534883.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/534883.shtml
英文地址，請注明出處：http://en.pswp.cn/news/534883.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！