一、Tesseract:
1.下載windows版:?tesseract
2. 安裝并記下路徑,等會要填
3.保存.py文件
import pytesseract
from PIL import Image
def ocr_local_image(image_path):try:pytesseract.pytesseract.tesseract_cmd = r'D:\Programs\Tesseract-OCR\tesseract.exe'img = Image.open(image_path)text = pytesseract.image_to_string(img, lang='eng')return text.strip()except Exception as e:return "error" if __name__ == "__main__":result = ocr_local_image('1.jpg') # 只使用英語模型,簡化測試print(result)
4.運行代碼,搞定
二、PaddleOCR
tesseract中文支持不好,我們再玩下PaddleOCR,據說中文牛P:
安裝CPU版環境:
python -m uv pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
安裝GPU版環境:
python -m uv pip install paddlepaddle-gpu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
?安裝依賴:
uv pip install paddleocr
終端中輸入指令回車,搞定:?
paddleocr --image_dir 1.jpg
如果嫌結果太亂,代碼中提取一下:
from paddleocr import PaddleOCR
ocr = PaddleOCR(lang='ch') # ch,en
img_path = '3.jpg'
result = ocr.ocr(img_path)
for idx in range(len(result)):res = result[idx]for line in res:# 只輸出文本內容(通常在line[1][0]位置)而不是整個lineprint(line[1][0])
GTX1660Ti-6G,識別時間:0.6s