使用 pytesseract 構建一個簡單 OCR demo

簡介

pytesseract 庫是 Google Tesseract OCR （光學字符識別）引擎的一個 Python 封裝庫，使用廣泛且功能強大。

構建

使用 pytesseract 構建一個簡單 OCR demo。
步驟一：安裝必要的庫
您需要在您的 Python 環境中安裝 pytesseract、Pillow (用于圖像處理) 和 OpenCV (雖然不是必需的，但在處理圖像時非常有用)。
打開終端或命令提示符，運行以下命令：

pip install pytesseract Pillow opencv-python

步驟二：安裝 Tesseract OCR 引擎
pytesseract 只是一個 Python 接口，它需要后臺安裝的 Tesseract OCR 引擎才能工作。Tesseract 的安裝方法因操作系統而異：
Windows: 您可以從 Tesseract 官方 GitHub release 頁面下載安裝程序。安裝時請記住安裝路徑，之后可能需要在代碼中指定 Tesseract 的可執行文件路徑。
macOS: 使用 Homebrew 進行安裝：

    brew install tesseract

中文識別：如果您需要識別中文，請確保：

已通過 brew install tesseract-lang 安裝了中文字體數據。
在調用 image_to_string 時使用 lang=‘chi_sim’ (簡體中文) 或 lang=‘chi_tra’ (繁體中文)。

Linux (Ubuntu/Debian): 使用 apt-get 進行安裝：

    sudo apt-get install tesseract-ocrsudo apt-get install libtesseract-dev

步驟三：編寫 Python 代碼
創建一個 Python 文件 (例如 simple_ocr.py) 并粘貼以下代碼。

import pytesseract
from PIL import Image
import cv2
import os # 獲取當前腳本文件的絕對路徑
script_path = os.path.abspath(__file__)
# 獲取腳本文件所在的目錄
script_dir = os.path.dirname(script_path)# 如果您是Windows用戶，并且Tesseract沒有添加到系統環境變量中。
# tesseract_cmd_path = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # windows
tesseract_cmd_path = r'/opt/homebrew/bin/tesseract' # macOS/Linux 查詢命令：which tesseract # Check if the tesseract executable exists at the specified path
if not os.path.exists(tesseract_cmd_path):print(f"Error: Tesseract executable not found at {tesseract_cmd_path}")print("Please update 'tesseract_cmd_path' in the script to your Tesseract installation path.")
else:pytesseract.pytesseract.tesseract_cmd = tesseract_cmd_path# 指定您要進行OCR的圖片文件路徑
image_path = 'test_image.png' # 請替換為您的圖片文件路徑
image_path = os.path.join(script_dir, image_path)
# Check if the image file exists
if not os.path.exists(image_path):print(f"Error: Image file not found at {image_path}")print("Please make sure the image file exists and the path is correct.")
else:try:# 使用 Pillow 加載圖片# img = Image.open(image_path)# 或者使用 OpenCV 加載圖片，方便后續圖像處理img_cv = cv2.imread(image_path)# 如果使用 OpenCV 加載，需要轉換為 PIL Image 對象或直接傳給 image_to_string (cv2.imread returns numpy array)# pytesseract.image_to_string 可以接受 PIL Image 對象或 numpy array# 我們這里直接使用 numpy arrayimg_np = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB) # OpenCV讀取是BGR格式，Tesseract通常處理RGB# 使用 pytesseract.image_to_string 進行文字識別# lang 參數可以指定語言，例如 'eng' 表示英語，'chi_sim' 表示簡體中文# 您需要安裝對應語言的 Tesseract 語言包, macOS/Linux: brew install tesseract-langtext = pytesseract.image_to_string(img_np, lang='chi_sim') # 或者 lang='eng' for English# 打印識別結果print("---- 識別結果 ----")print(text)print("----------------")

步驟四：準備測試圖片
創建一個名為 test_image.png 的圖片文件，其中包含一些您想要識別的文字，并將其放在與 Python 腳本相同的目錄下。
在這里插入圖片描述

步驟五：運行代碼
在終端或命令提示符中，導航到保存 simple_ocr.py 文件的目錄，然后運行：

python simple_ocr.py

如果一切順利，您將在控制臺中看到從圖片中識別出的文字。
在這里插入圖片描述

注意事項：

Tesseract 安裝路徑:

如果您在 Windows 上運行，請務必將 tesseract_cmd_path 變量的值修改為您系統中 tesseract.exe 的實際安裝路徑。
在 macOS 或 Linux 上，如果 Tesseract 已通過包管理器安裝并添加到 PATH 中，代碼中的默認路徑通常是正確的，或者您也可以嘗試注釋掉設置 pytesseract.pytesseract.tesseract_cmd 的那一行，讓 pytesseract 自己去尋找。

語言包:

如果您需要識別非英文字符（例如中文），您還需要安裝對應的 Tesseract 語言包，并在 pytesseract.image_to_string 函數中指定 lang 參數，例如 lang=‘chi_sim’。
語言包的安裝通常是將對應的 .traineddata 文件放到 Tesseract 安裝目錄下的 tessdata 文件夾中。

圖片質量:

OCR 識別效果很大程度上取決于輸入圖片的質量。清晰、高對比度、文字方向正確的圖片更容易識別。
對于有噪聲或扭曲的圖片，您可能需要使用 OpenCV 等庫進行預處理（如二值化、去噪、旋轉矯正）來提高識別率。代碼中也提供了加載圖片并進行顏色空間轉換的部分，為可能的預處理留下了空間。

進一步探索 pytesseract 的其他功能：

例如 image_to_data 獲取文字位置信息、image_to_boxes 獲取字符邊界框等，以便構建更復雜的 OCR 應用。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/82965.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/82965.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/82965.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！