OCRBench：評估多模態大模型的OCR能力

論文地址：OCRBench: On?The?Hidden?Mystery?of?OCR In?Large?Multimodal?Models：2305.07895

OCRBench在10個文本相關任務上測評多模態大模型（LMM）的OCR能力，包含1000個問題-答案對，每個問題-答案對包含以下五個類別：index（索引），image（圖片），question（問題），answer（回答），category（問題類別）。其中問題類別主要有以下內容：

任務	翻譯	image（圖片）示例	question（問題）示例	answer（回答）示例	任務數量
Key Information Extraction	關鍵信息提取		what is the total amount of this receipt? Answer this question using the text in the image directly.?	['26.58']	200
Doc-oriented VQA	面向文檔的視覺問答		Whats the Venue Name?? ?	['the halfmoon']	200
Scene Text-centric VQA	以場景文本為中心的視覺問答		What is the title of the book???	['PENDRAGON']	200
Handwritten Mathematical Expression Recognition	手寫數學表達式識別		Please write out the expression of the formula in the image using LaTeX format.	['x = \\frac { 1 7 } { 5 }\n']	100
Irregular Text Recognition?	不規則文本識別		what is written in the image?	['COFFEE']	50
Regular Text Recognition	規則文本識別		what is written in the image?	['CHAIN']	50
Non-Semantic Text Recognition	非語義文本識別		what is written in the image?	['espt']	50
Digit String Recognition	數字字符串識別		what is the number in the image?	['9557']	50
Handwriting Recognition	手寫體識別		what is written in the image?	['bread']	50
Artistic Text Recognition?	藝術文本識別		what is written in the image?	['Home']	50
Total	總計	-	-	-	1000

需要注意的是，在tsv文件中，圖片使用Base64編碼保存。Base64 編碼可將二進制圖像文件（PNG、JPEG、GIF）轉換為緊湊的純文本字符串，從而直接嵌入到 HTML、CSS 或 JSON 中。

要將Base64編碼轉換為圖片，有以下三種方式：

（1）使用在線網站：例如：Base64 轉圖片轉換器 – 免費在線工具箱 - DopuBOX

（2）使用腳本：

import base64# 1. 復制 Base64 編碼字符串
base64_data = "/9j/4AAQSkZJRgABAQAAAQABAAD/...（完整字符串）/ALz44+gHAooA/9k="# 2. 解碼并保存為圖片
with open("output.jpg", "wb") as f:f.write(base64.b64decode(base64_data))print("圖片已保存為 output.jpg")

（3）瀏覽器直接預覽

在 HTML 文件中使用以下代碼：

<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ...（完整 Base64 字符串）.../9k=">

用瀏覽器打開該 HTML 文件即可顯示圖片。

說明

編碼類型：該字符串是?JPEG 圖片的 Base64 編碼（以?/9j/?開頭）。
注意事項：確保復制完整的編碼（從?/9j/?到結束標記?/9k=），否則轉換會失敗。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/85443.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/85443.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/85443.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！