🍁 前言
最近閱讀論文,在做筆記的時候總是要手動輸入一些latex公式,非常耗時。我使用Hapigo的Latex 公式識別,感覺還挺好用,但是缺陷是只有30次免費額度,于是在網上搜索了一下,發現可以通過本地部署Latex OCR來無限制識別latex公式。下面是我部署latex OCR的過程以及我自己總結的一些優化技巧。
🌿 部署
在 M1 上安裝 LaTeX-OCR 識別工具
珠玉在前,就不班門弄斧了,需要注意的是這篇帖子的第3步的路徑需要修改為你本機電腦的路徑。
sudo cp -r /opt/homebrew/Cellar/pyqt@5/5.15.7_2/lib/python3.9/site-packages/* /Users/rey/miniconda3/lib/python3.9/site-packages/
主要就是修改/5.15.7_2
、python3.9
、rey
,通過按Tap
鍵的方式可以快速補全。
同時為了防止鏈接失效,我也手動將作者的步驟粘貼如下:
pip install "pix2tex[gui]"
brew install pyqt@5
sudo cp -r /opt/homebrew/Cellar/pyqt@5/5.15.7_2/lib/python3.9/site-packages/* /Users/rey/miniconda3/lib/python3.9/site-packages/
pip install pynput screeninfo
conda install pytorch torchvision
除了圖片中提到的在命令行中輸入python -m pix2tex latexocr
的使用方法,還可以使用latex OCR的GUI界面,只需要在終端輸入latexocr
或者pix2tex_gui
,稍等片刻(打開30秒),就會打開相應的GUI界面。
🌱 優化
主要優化了兩個點:
- 將代碼打包成了Mac下的app
- 注冊了全局快捷鍵,可以在程序后臺運行時按下快捷鍵直接調用OCR識別公式
打包成app
- 打開Mac的“自動操作”App
- 搜索運行腳本,并雙擊“運行Shell腳本”,此時右邊會出現對應的流程項
選擇Shell類型,這里可以在終端輸入echo $SHELL
來查看你當前使用的shell是什么類型,我這里是/bin/zsh
- 編寫腳本內容,首先在終端輸入
where latexocr
,會輸出可執行文件的絕對路徑,學過計算機組成原理都知道,在unix系統下,在終端中直接輸入可執行文件的絕對路徑就能夠直接運行這個可執行文件,其原理就是shell解釋器會去找到這個路徑對應的可執行文件并運行之。綜上,這個可執行文件路徑就是我們要編寫的腳本命令.
- 測試運行,點擊右上角的運行,稍等片刻,就會彈出GUI窗口,測試成功
-
打包成app
-
更改圖標,默認的自動化打包的app圖標丑的一批,左邊是我優化后的圖標,看起來舒服多了,在搜索引擎上搜索關鍵詞“OCR 圖標”可以找到類似的圖標,大家可以自己挑選
更換方式如下
- 復制你的圖標圖片到剪切板
- 在"應用程序"中找到剛剛打包好的app,選中,按下
command + I
,顯示簡介- 選中左上角的圖標
- 按下
command + V
粘貼你剛才復制的圖標圖片,即可替換成功
注冊全局快捷鍵
在使用的時候,有個很不方便的地方是,你必須打開latex ocr的窗口,然后按下快捷鍵才能夠調用OCR截圖識別公式,就很麻煩。于是我在想能不能讓程序在后臺運行的時候,自動監聽快捷鍵,在無需顯示打開窗口的情況下就能直接調用OCR截圖識別,經過我的嘗試,發現通過修改源代碼的方式,在代碼中使用pynut庫可以達到預想的效果。
在/opt/miniconda3/lib/python3.12/site-packages/pix2tex
路徑下(大致路徑是這樣,請根據本機的具體情況微調)找到gui.py這個文件,打開并編輯之。(溫馨提示:修改前記得先備份哦)
我主要是修改了這幾個地方:
- 將默認的識別格式
LaTeX-$
修改成了Raw
,這樣識別的結果前后就沒有$
- 增加了系統托盤
- 將快捷鍵設置成了
option + ctrl
,這是因為輸入option + 字母
組合,pynut會將其識別成特殊字符,而不是組合鍵,比如按下option + z
,就會識別成Ω
,所以這里選擇了不會產生特殊字符的組合鍵option + ctrl
- 增加了后臺監聽快捷鍵的功能,app在后臺運行時也能夠監聽到快捷鍵
讀者感興趣的話,也可以使用文本對比工具(比如Beyond Compare)比較我修改的代碼和原有代碼的區別
最后在這里直接貼一下源代碼,需要源代碼文件也可以通過網盤下載。
網盤鏈接
源代碼(修改版)
from shutil import which
import io
import subprocess
import sys
import os
import re
import tempfile
import threading
from PyQt6 import QtCore, QtGui
from PyQt6.QtCore import Qt, pyqtSlot, pyqtSignal, QThread, QTimer, QEvent
from PyQt6.QtGui import QGuiApplication
from PyQt6.QtWebEngineWidgets import QWebEngineView
from PyQt6.QtWidgets import QMainWindow, QApplication, QMessageBox, QVBoxLayout, QWidget, \QPushButton, QTextEdit, QFormLayout, QHBoxLayout, QDoubleSpinBox, QLabel, QRadioButton, \QSystemTrayIcon, QMenu
from pynput.mouse import Controller
from pynput import keyboard
from pynput.keyboard import Key, Listenerfrom PIL import ImageGrab, Image, ImageEnhance
import numpy as np
from screeninfo import get_monitors
from pix2tex import cli
from pix2tex.utils import in_model_path
from latex2sympy2 import latex2sympyimport pix2tex.resources.resourcesACCEPTED_IMAGE_SUFFIX = ['png', 'jpg', 'jpeg']def to_sympy(latex):normalized = re.sub(r'operatorname\*{(\w+)}', '\g<1>', latex)sympy_expr = latex2sympy(f'${normalized}$')return sympy_exprclass WebView(QWebEngineView):def __init__(self, app) -> None:super().__init__()self.setAcceptDrops(True)self._app = appdef dragEnterEvent(self, event):if event.mimeData().urls():event.accept()else:event.ignore()def dropEvent(self, event):urls = event.mimeData().urls()self._app.returnFromMimeData(urls)class App(QMainWindow):isProcessing = FalseglobalHotkeyPressed = pyqtSignal() # 添加全局熱鍵信號def __init__(self, args=None):super().__init__()self.args = argsself.model = cli.LatexOCR(self.args)self.initUI()self.snipWidget = SnipWidget(self)# 初始化系統托盤self.initTray()# 連接全局熱鍵信號self.globalHotkeyPressed.connect(self.onClick)# 啟動全局熱鍵監聽self.hotkey_thread = threading.Thread(target=self.start_global_hotkey_listener, daemon=True)self.hotkey_thread.start()self.show()def initTray(self):"""初始化系統托盤"""self.tray = QSystemTrayIcon(self)self.tray.setIcon(QtGui.QIcon(':/icons/icon.svg'))# 創建托盤菜單tray_menu = QMenu()self.show_action = tray_menu.addAction("顯示窗口")self.show_action.triggered.connect(self.showNormal)quit_action = tray_menu.addAction("退出")quit_action.triggered.connect(QApplication.quit)self.tray.setContextMenu(tray_menu)self.tray.show()def start_global_hotkey_listener(self):"""啟動全局熱鍵監聽"""# 創建按鍵狀態集合keys_pressed = set()def on_press(key):try:# 檢測 Option/Alt 鍵if key == Key.alt or key == Key.alt_l or key == Key.alt_r:keys_pressed.add('alt')# 檢測 Ctrl 鍵elif key == Key.ctrl or key == Key.ctrl_l or key == Key.ctrl_r:keys_pressed.add('ctrl')# 檢查是否同時按下了 Alt 和 Ctrlif 'alt' in keys_pressed and 'ctrl' in keys_pressed:# 確保在主線程中發出信號QtCore.QMetaObject.invokeMethod(self, "globalHotkeyPressed", QtCore.Qt.ConnectionType.QueuedConnection)# 清空按鍵集合,避免連續觸發keys_pressed.clear()except Exception as e:print(f"熱鍵監聽錯誤: {e}")def on_release(key):try:# 釋放按鍵時從集合中移除if key == Key.alt or key == Key.alt_l or key == Key.alt_r:keys_pressed.discard('alt')elif key == Key.ctrl or key == Key.ctrl_l or key == Key.ctrl_r:keys_pressed.discard('ctrl')except Exception as e:print(f"熱鍵監聽錯誤 (釋放): {e}")with Listener(on_press=on_press, on_release=on_release) as listener:listener.join()def closeEvent(self, event):"""窗口關閉事件處理"""if self.tray.isVisible():self.hide()event.ignore()def initUI(self):self.setWindowTitle("LaTeX OCR")QApplication.setWindowIcon(QtGui.QIcon(':/icons/icon.svg'))self.left = 300self.top = 300self.width = 500self.height = 300self.setGeometry(self.left, self.top, self.width, self.height)self.format_type = 'Raw' # 秋窗修改了初始化格式self.raw_prediction = ''# Create LaTeX displayself.webView = WebView(self)self.webView.setHtml("")self.webView.setMinimumHeight(80)# Create textboxself.textbox = QTextEdit(self)# self.textbox.textChanged.connect(self.displayPrediction)self.textbox.textChanged.connect(self.onTextboxChange)self.textbox.setMinimumHeight(40)self.format_textbox = QTextEdit(self)# self.textbox.textChanged.connect(self.displayPrediction)self.format_textbox.textChanged.connect(self.onFormatTextboxChange)self.format_textbox.setMinimumHeight(40)# format typesformat_types = QHBoxLayout()self.format_label = QLabel('Format:', self)self.format_type0 = QRadioButton('Raw', self)self.format_type0.toggled.connect(self.onFormatChange)self.format_type1 = QRadioButton('LaTeX-$', self)self.format_type0.setChecked(True) # 秋窗修改此處,以默認選擇Raw格式self.format_type1.toggled.connect(self.onFormatChange)self.format_type2 = QRadioButton('LaTeX-$$', self)self.format_type2.toggled.connect(self.onFormatChange)self.format_type3 = QRadioButton('Sympy', self)self.format_type3.toggled.connect(self.onFormatChange)format_types.addWidget(self.format_label)format_types.addWidget(self.format_type0)format_types.addWidget(self.format_type1)format_types.addWidget(self.format_type2)format_types.addWidget(self.format_type3)# error outputself.error = QTextEdit(self)self.error.setReadOnly(True)self.error.setTextColor(Qt.GlobalColor.red)self.error.setMinimumHeight(12)# Create temperature text inputself.tempField = QDoubleSpinBox(self)self.tempField.setValue(self.args.temperature)self.tempField.setRange(0, 1)self.tempField.setSingleStep(0.1)# Create snip buttonif sys.platform == "darwin":self.snipButton = QPushButton('Snip [Option+Ctrl]', self) # 修改按鈕文本self.snipButton.clicked.connect(self.onClick)else:self.snipButton = QPushButton('Snip [Alt+Ctrl]', self) # 修改按鈕文本self.snipButton.clicked.connect(self.onClick)self.shortcut = QtGui.QShortcut(QtGui.QKeySequence('Ctrl+Alt+Z'), self) # 修改快捷鍵self.shortcut.activated.connect(self.onClick)# Create retry buttonself.retryButton = QPushButton('Retry', self)self.retryButton.setEnabled(False)self.retryButton.clicked.connect(self.returnSnip)# Create layoutcentralWidget = QWidget()centralWidget.setMinimumWidth(200)self.setCentralWidget(centralWidget)lay = QVBoxLayout(centralWidget)lay.addWidget(self.webView, stretch=4)lay.addWidget(self.textbox, stretch=2)lay.addLayout(format_types)lay.addWidget(self.format_textbox, stretch=2)lay.addWidget(self.error, stretch=1)buttons = QHBoxLayout()buttons.addWidget(self.snipButton)buttons.addWidget(self.retryButton)lay.addLayout(buttons)settings = QFormLayout()settings.addRow('Temperature:', self.tempField)lay.addLayout(settings)self.installEventFilter(self)def toggleProcessing(self, value=None):if value is None:self.isProcessing = not self.isProcessingelse:self.isProcessing = valueif self.isProcessing:text = 'Interrupt'func = self.interruptelse:if sys.platform == "darwin":text = 'Snip [Option+Ctrl]' # 修改按鈕文本else:text = 'Snip [Alt+Ctrl]' # 修改按鈕文本func = self.onClickself.retryButton.setEnabled(True)self.shortcut.setEnabled(not self.isProcessing)self.snipButton.setText(text)self.snipButton.clicked.disconnect()self.snipButton.clicked.connect(func)self.displayPrediction()def eventFilter(self, obj, event):if event.type() == QEvent.Type.KeyRelease:if event.key() == Qt.Key.Key_V and event.modifiers() == Qt.KeyboardModifier.ControlModifier:clipboard = QApplication.clipboard()img = clipboard.image()if not img.isNull():self.returnSnip(Image.fromqimage(img))else:self.returnFromMimeData(clipboard.mimeData().urls())return super().eventFilter(obj, event)@pyqtSlot()def onClick(self):"""點擊截圖按鈕或快捷鍵時調用"""# 確保窗口可見if self.isHidden():self.showNormal()self.activateWindow()self.raise_()self.close()if os.environ.get('SCREENSHOT_TOOL') == "gnome-screenshot":self.snip_using_gnome_screenshot()elif os.environ.get('SCREENSHOT_TOOL') == "spectacle":self.snip_using_spectacle()elif os.environ.get('SCREENSHOT_TOOL') == "grim":self.snip_using_grim()elif os.environ.get('SCREENSHOT_TOOL') == "pil":self.snipWidget.snip()elif which('gnome-screenshot'):self.snip_using_gnome_screenshot()elif which('grim') and which('slurp'):self.snip_using_grim()else:self.snipWidget.snip()@pyqtSlot()def interrupt(self):if hasattr(self, 'thread'):self.thread.terminate()self.thread.wait()self.toggleProcessing(False)def snip_using_gnome_screenshot(self):try:with tempfile.NamedTemporaryFile() as tmp:subprocess.run(["gnome-screenshot", "--area", f"--file={tmp.name}"])# Use `tmp.name` instead of `tmp.file` due to compatability issues between Pillow and tempfileself.returnSnip(Image.open(tmp.name))except:print(f"Failed to load saved screenshot! Did you cancel the screenshot?")print("If you don't have gnome-screenshot installed, please install it.")self.returnSnip()def snip_using_spectacle(self):try:with tempfile.NamedTemporaryFile() as tmp:subprocess.run(["spectacle", "-r", "-b", "-n", "-o", f"{tmp.name}"])self.returnSnip(Image.open(tmp.name))except:print(f"Failed to load saved screenshot! Did you cancel the screenshot?")print("If you don't have spectacle installed, please install it.")self.returnSnip()def snip_using_grim(self):try:p = subprocess.run('slurp',check=True,capture_output=True,text=True)geometry = p.stdout.strip()p = subprocess.run(['grim', '-g', geometry, '-'],check=True,capture_output=True)self.returnSnip(Image.open(io.BytesIO(p.stdout)))except:print(f"Failed to load saved screenshot! Did you cancel the screenshot?")print("If you don't have slurp and grim installed, please install them.")self.returnSnip()def returnFromMimeData(self, urls):if not urls or not urls[0]:returnimage_url = urls[0]if image_url and image_url.scheme() == 'file' and image_url.fileName().split('.')[-1] in ACCEPTED_IMAGE_SUFFIX:image_path = image_url.toLocalFile()return self.returnSnip(Image.open(image_path))def returnSnip(self, img=None):self.toggleProcessing(True)self.retryButton.setEnabled(False)if img:width, height = img.sizeif width <= 0 or height <= 0:self.toggleProcessing(False)self.retryButton.setEnabled(True)self.show()returnif width < 100 or height < 100: # too small size will make OCR wrongscale_factor = max(100 / width, 100 / height)new_width = int(width * scale_factor)new_height = int(height * scale_factor)img = img.resize((new_width,new_height), Image.Resampling.LANCZOS)contrast = ImageEnhance.Contrast(img)img = contrast.enhance(1.5)sharpness = ImageEnhance.Sharpness(img)img = sharpness.enhance(1.5)self.show()try:self.model.args.temperature = self.tempField.value()if self.model.args.temperature == 0:self.model.args.temperature = 1e-8except:pass# Run the model in a separate threadself.thread = ModelThread(img=img, model=self.model)self.thread.finished.connect(self.returnPrediction)self.thread.finished.connect(self.thread.deleteLater)self.thread.start()def returnPrediction(self, result):self.toggleProcessing(False)success, prediction = result["success"], result["prediction"]if success:self.raw_prediction = predictionself.textbox.setText(prediction)self.format_textbox.setText(self.formatPrediction(prediction))self.displayPrediction(prediction)self.retryButton.setEnabled(True)else:self.webView.setHtml("")msg = QMessageBox()msg.setWindowTitle(" ")msg.setText("Prediction failed.")msg.exec()def onFormatChange(self):rb = self.sender()if rb.isChecked():self.format_type = rb.text()#self.format_textbox.setText(self.formatPrediction(self.raw_prediction)) # 秋窗修改了此處,因為把初始格式設置成了Raw,不注釋這行會報錯 def formatPrediction(self, prediction, format_type=None):self.error.setText("")prediction = prediction or self.format_textbox.toPlainText()raw = prediction.strip('$')if len(raw) == 0:return ''format_type = format_type or self.format_typeif format_type == "Raw":formatted = rawelif format_type == "LaTeX-$":formatted = f"${raw}$"elif format_type == "LaTeX-$$":formatted = f"$${raw}$$"elif format_type == "MathJax":formatted = rawelif format_type == "Sympy":try:formatted = str(to_sympy(raw))except Exception as e:print(e)formatted = rawself.error.setText("Failed to parse Sympy expr.")else:return rawreturn formatteddef onTextboxChange(self):text = self.textbox.toPlainText()new_raw_prediction = self.formatPrediction(text, "Raw")if new_raw_prediction != self.raw_prediction:self.raw_prediction = new_raw_predictionself.format_textbox.setText(self.formatPrediction(self.raw_prediction))self.displayPrediction()def onFormatTextboxChange(self):text = self.format_textbox.toPlainText()clipboard = QApplication.clipboard()clipboard.setText(text)def displayPrediction(self, prediction=None):if self.isProcessing:pageSource = """<center><img src="qrc:/icons/processing-icon-anim.svg" width="50", height="50"></center>"""else:if prediction is None:prediction = self.textbox.toPlainText().strip('$')pageSource = """<html><head><script id="MathJax-script" src="qrc:MathJax.js"></script><script>MathJax.Hub.Config({messageStyle: 'none',tex2jax: {preview: 'none'}});MathJax.Hub.Queue(function () {document.getElementById("equation").style.visibility = "";});</script></head> """ + """<body><div id="equation" style="font-size:1em; visibility:hidden">$${equation}$$</div></body></html>""".format(equation=prediction)self.webView.setHtml(pageSource)class ModelThread(QThread):finished = pyqtSignal(dict)def __init__(self, img, model):super().__init__()self.img = imgself.model = modeldef run(self):try:prediction = self.model(self.img)# replace <, > with \lt, \gt so it won't be interpreted as html codeprediction = prediction.replace('<', '\\lt ').replace('>', '\\gt ')self.finished.emit({"success": True, "prediction": prediction})except Exception as e:import tracebacktraceback.print_exc()self.finished.emit({"success": False, "prediction": None})class SnipWidget(QMainWindow):isSnipping = Falsedef __init__(self, parent):super().__init__()self.parent = parentmonitos = get_monitors()bboxes = np.array([[m.x, m.y, m.width, m.height] for m in monitos])x, y, _, _ = bboxes.min(0)w, h = bboxes[:, [0, 2]].sum(1).max(), bboxes[:, [1, 3]].sum(1).max()self.setGeometry(x, y, w-x, h-y)self.begin = QtCore.QPoint()self.end = QtCore.QPoint()self.mouse = Controller()# Create and start the timerself.factor = QGuiApplication.primaryScreen().devicePixelRatio()self.timer = QTimer(self)self.timer.timeout.connect(self.update_geometry_based_on_cursor_position)self.timer.start(500)def update_geometry_based_on_cursor_position(self):if not self.isSnipping:return# Update the geometry of the SnipWidget based on the current screenmouse_pos = QtGui.QCursor.pos()screen = QGuiApplication.screenAt(mouse_pos)if screen:self.factor = screen.devicePixelRatio()screen_geometry = screen.geometry()self.setGeometry(screen_geometry)def snip(self):self.isSnipping = Trueself.setWindowFlags(QtCore.Qt.WindowType.WindowStaysOnTopHint)QApplication.setOverrideCursor(QtGui.QCursor(QtCore.Qt.CursorShape.CrossCursor))self.show()def paintEvent(self, event):if self.isSnipping:brushColor = (0, 180, 255, 100)opacity = 0.3else:brushColor = (255, 255, 255, 0)opacity = 0self.setWindowOpacity(opacity)qp = QtGui.QPainter(self)qp.setPen(QtGui.QPen(QtGui.QColor('black'), 2))qp.setBrush(QtGui.QColor(*brushColor))qp.drawRect(QtCore.QRect(self.begin, self.end))def keyPressEvent(self, event):if event.key() == QtCore.Qt.Key.Key_Escape.value:QApplication.restoreOverrideCursor()self.close()self.parent.show()event.accept()def mousePressEvent(self, event):self.startPos = self.mouse.positionself.begin = event.pos()self.end = self.beginself.update()def mouseMoveEvent(self, event):self.end = event.pos()self.update()def mouseReleaseEvent(self, event):self.isSnipping = FalseQApplication.restoreOverrideCursor()startPos = self.startPosendPos = self.mouse.positionx1 = int(min(startPos[0], endPos[0]))y1 = int(min(startPos[1], endPos[1]))x2 = int(max(startPos[0], endPos[0]))y2 = int(max(startPos[1], endPos[1]))self.repaint()QApplication.processEvents()try:img = ImageGrab.grab(bbox=(x1, y1, x2, y2), all_screens=True)except Exception as e:if sys.platform == "darwin":img = ImageGrab.grab(bbox=(x1//self.factor, y1//self.factor,x2//self.factor, y2//self.factor), all_screens=True)else:raise eQApplication.processEvents()self.close()self.begin = QtCore.QPoint()self.end = QtCore.QPoint()self.parent.returnSnip(img)def main(arguments):with in_model_path():if os.name != 'nt':os.environ['QTWEBENGINE_DISABLE_SANDBOX'] = '1'app = QApplication(sys.argv)ex = App(arguments)sys.exit(app.exec())
最后是我在使用時遇到一個bug,就是我在按下option + ctrl
調用OCR時,如果此時我不小心點擊了鼠標或者觸控板,導致截圖失敗,會出現如下報錯:
此時,這個透明背景的窗口由于高度超出了屏幕高度,并且不能通過向上拖動窗口使其下面遮掩的部分顯示出來,因此也就無法將其關閉,我研究了一下,找到下面這個方法:
打開"活動監視器"APP,搜索Latex OCR,關閉進程