前言
書接上文
- OCR實踐—PaddleOCR
Table-Transformer 與 PubTables-1M
table-transformer,來自微軟,基于Detr,在PubTables1M 數據集上進行訓練,模型是在提出數據集同時的工作,
paper PubTables-1M: Towards comprehensive table extraction from unstructured documents,發表在2022年的 CVPR
數據來自 PubMed PMCOA 數據庫的 一百萬個 文章表格
PubTables-1M 針對表格處理 一共有 三個任務(所以table transformer 也能做到)
- 表格檢測(表格定位)TD
- 表格結構識別(行、列、spanning cell,grid cell, text cell)TSR
- 表格分析(表頭 cell,projected row header cell) FA
table-transformer
是第一個將 detr 用于 表格處理任務的 模型,沒有使用任何特別的定制模塊,簡稱為 TATR
we apply the Detection Transformer (DETR) [2] for the first time to the tasks of TD, TSR, and FA, and demonstrate how with PubTables-1M all three tasks can be addressed with a transformer-based object detection framework without any special customization for these tasks.
有關模型詳細的權重、指標信息 可以通過論文 和 Github倉庫 可以進一步了解
https://arxiv.org/abs/2110.00061
https://github.com/microsoft/table-transformer
官方也在HuggingFace 上提供了各個模型權重
https://huggingface.co/collections/microsoft/table-transformer-6564528e330b667bb267502e
各個模型的版本和區別 信息如下
官方提示,microsoft/table-transformer-structure-recognition-v1.1-all 是最好的結構識別模型
實踐代碼
如有問題,需要幫助,歡迎留言、私信或加群 交流【群號:392784757】
表格檢測 TD
通過以下設置,可以加速下載以及保存模型到當前文件夾下
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "0"
os.environ['HF_HUB_CACHE'] = './hf_models/'
os.environ['TRANSFORMERS_CACHE'] = './hf_models'
os.environ['HF_HOME'] = './hf_models'
打開文件
table_img_path = './table.jpg'
image = Image.open(table_img_path).convert("RGB")
file_name = table_img_path.split('/')[-1].split('.')[0]
加載模型
image_processor = AutoImageProcessor.from_pretrained("microsoft/table-transformer-detection")
model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-detection")
模型推理與后處理
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
結果解析
i = 0
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):box = [round(i, 2) for i in box.tolist()]print(f"Detected {model.config.id2label[label.item()]} with confidence "f"{round(score.item(), 3)} at location {box}")region = image.crop(box) #檢測region.save(f'./{file_name}_{i}.jpg')i += 1
表格結構識別 TSR
打開圖片與模型加載
from transformers import DetrFeatureExtractor
feature_extractor = DetrFeatureExtractor()file_path = "./locate_table.jpg"
image = Image.open(file_path).convert("RGB")encoding = feature_extractor(image, return_tensors="pt")
model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition-v1.1-all")
print(model.config.id2label)
# {0: 'table', 1: 'table column', 2: 'table row', 3: 'table column header', 4: 'table projected row header', 5: 'table spanning cell'}
模型推理與后處理
with torch.no_grad():outputs = model(**encoding)target_sizes = [image.size[::-1]]
results = feature_extractor.post_process_object_detection(outputs, threshold=0.6, target_sizes=target_sizes)[0]
# print(results)
結果解析
header
headers_box_list = [results['boxes'][i].tolist() for i in range(len(results['boxes'])) if results['labels'][i].item()==3]
crop_image = image.crop(headers_box_list[0])
crop_image.save('header.png')
column
columns_box_list = [results['boxes'][i].tolist() for i in range(len(results['boxes'])) if results['labels'][i].item()==1]
print(len(columns_box_list))
row
rows_box_list = [results['boxes'][i].tolist() for i in range(len(results['boxes'])) if results['labels'][i].item()==2]
print(len(rows_box_list))
cell
cell_draw_image = image.copy()
cell_draw = ImageDraw.Draw(cell_draw_image)# col row inserction
for col in columns_box_list:for row in rows_box_list:cell = intersection(col,row) # 自行定義 if cell is not None:cell_draw.rectangle(cell, outline="red", width=3)cell_draw_image.save("cells.png")
效果
效果還不錯
感謝
感謝以下文章提供的靈感與代碼參考
- [表格檢測與識別入門 - My Github Blog](https://percent4.github.io/表格檢測與識別入門/#表格結構識別
- 表格檢測與識別的初次嘗試