PDF文件轉文件
1. 引入Maven依賴
主要使用了 pdfbox 包與 hutool 包。
pdfbox 負責 pdf 到圖片的轉換;
hutool 負責文件讀取轉換。
<dependency><groupId>org.apache.pdfbox</groupId><artifactId>pdfbox</artifactId><version>2.0.27</version>
</dependency>
<dependency><groupId>cn.hutool</groupId><artifactId>hutool-all</artifactId><version>5.8.34</version>
</dependency>
2. 代碼實現
主要思路:
pdfbox 提供了操作輸入流與操作字節數組的兩種方式。
2.1 字節數組
public void pdf2Image() {// 這邊簡單采用讀取本地文件的形式File file = new File("");File outFile = new File("");byte[] bytes = FileUtil.readBytes(file);String formatName = "png";try (PDDocument document = PDDocument.load(bytes)) {PDFRenderer pdfRenderer = new PDFRenderer(document);int numberOfPages = document.getNumberOfPages();// 將 BufferedImage 轉換為字節數組ByteArrayOutputStream baos = new ByteArrayOutputStream();for (int i = 0; i < numberOfPages; i++) {// 渲染第一頁為 BufferedImageBufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 50);ImgUtil.write(bufferedImage, formatName, baos);}OutputStream outputStream = new FileOutputStream(outFile);baos.writeTo(outputStream);} catch (Exception e) {e.printStackTrace();}
}
通過字節數組可實現 pdf 文件轉換為圖片,但是這個代碼在處理大文件時會一次性把文件讀進內存導致內存溢出。
2.2 文件流
public void pdf2Image() {File file = new File("");File outFile = new File("");String formatName = "png";try (InputStream is = new BufferedInputStream(new FileInputStream(file))) {PDDocument document = PDDocument.load(is, MemoryUsageSetting.setupTempFileOnly());PDFRenderer pdfRenderer = new PDFRenderer(document);int numberOfPages = document.getNumberOfPages();// 將 BufferedImage 轉換為字節數組ByteArrayOutputStream baos = new ByteArrayOutputStream();for (int i = 0; i < numberOfPages; i++) {// 渲染第一頁為 BufferedImageBufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 50);ImgUtil.write(bufferedImage, formatName, baos);}OutputStream outputStream = new FileOutputStream(outFile);baos.writeTo(outputStream);} catch (Exception e) {e.printStackTrace();}
}