關于 omniparser v2 本地部署,網上資料不算多,尤其是對于土薔內用戶,還是有些坑的。
1、安裝步驟
可參考兩個CSDN博客:
(1)大模型實戰 - ‘OmniParser-V2本地部署安裝??鏈接
(2)微軟開源神器OmniParser-v2.0本地部署教程??鏈接
2、排錯
(1)缺 microsoft/Florence-2-base 或其他的一些模型權重,都可以去 modelscope 下載。網站上有下載命令。
(2)提示:
To use Transformers in an offline or firewalled environment requires the downloaded and cached files ahead of time.
根據官方?文檔?說明
修改 util/utils.py 文件中
processor = AutoProcessor.from_pretrained("/home/xxxxxxx/OmniParser/microsoft/Florence-2-base", local_files_only=True,trust_remote_code=True)
主要是把 microsoft/Florence-2-base 這個模型名換成具體的文件路徑,并設置 trust_remote_code=True
(3)更換鏡像(可選)
剛才提到的文章:微軟開源神器OmniParser-v2.0本地部署教程(鏈接),其中提到更換鏡像,對于不熟悉的人來說,可能不知道作者在說什么。這里補充:
其實是更換huggingface鏡像服務器:
位置:transformers/constants.py
(例如:~/.local/lib/python3.10/site-packages/transformers/constants.py)
位置:huggingface_hub/constants.py
(例如:~/.local/lib/python3.10/site-packages/huggingface_hub/constants.py)
(4)錯誤:
Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps:
1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.3/frpc_linux_arm64
2. Rename the downloaded file to: frpc_linux_arm64_v0.3
3. Move the file to this location: /home/xxxxxx/miniconda3/envs/omni/lib/python3.12/site-packages/gradio
gitee上有 frpc_linux_arm64 這個文件,可以去下載。然后按照提示改名、移動位置、增加權限:
chmod +x /home/xxxx/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/frpc_linux_arm64_v0.3
(5)提示:
TypeError: argument of type 'bool' is not iterable
Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
改 gradio_demo.py 文件的下面一行代碼
demo.launch(share=False, server_port=7861, server_name='0.0.0.0')
設置share=False
(6)提示:
Florence2ForConditionalGeneration.forward() got an unexpected keyword argument 'images'
定位 util/utils.py 文件中
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float32, trust_remote_code=True)
前面加一句:
model_name_or_path="/home/xxxxxx/OmniParser/microsoft/Florence-2-base-ft"
因為前面代碼有默認設置:
model_name_or_path="Salesforce/blip2-opt-2.7b"
3、構建 docker 鏡像
(1)先用豆包將 gradio_demo.py 改成接收 http 請求的服務器。
from typing import Optional
import base64
import io
import os
from flask import Flask, request, jsonify
from PIL import Image
import numpy as np
import torch
from util.utils import check_ocr_box, get_yolo_model, get_caption_model_processor, get_som_labeled_imgapp = Flask(__name__)yolo_model = get_yolo_model(model_path='weights/icon_detect/model.pt')
caption_model_processor = get_caption_model_processor(model_name="florence2", model_name_or_path="weights/icon_caption_florence")
# caption_model_processor = get_caption_model_processor(model_name="blip2", model_name_or_path="weights/icon_caption_blip2")DEVICE = torch.device('cuda')def process(image_input,box_threshold,iou_threshold,use_paddleocr,imgsz
) -> Optional[Image.Image]:box_overlay_ratio = image_input.size[0] / 3200draw_bbox_config = {'text_scale': 0.8 * box_overlay_ratio,'text_thickness': max(int(2 * box_overlay_ratio), 1),'text_padding': max(int(3 * box_overlay_ratio), 1),'thickness': max(int(3 * box_overlay_ratio), 1),}ocr_bbox_rslt, is_goal_filtered = check_ocr_box(image_input, display_img=False, output_bb_format='xyxy',goal_filtering=None, easyocr_args={'paragraph': False,'text_threshold': 0.9},use_paddleocr=use_paddleocr)text, ocr_bbox = ocr_bbox_rsltdino_labled_img, label_coordinates, parsed_content_list = get_som_labeled_img(image_input, yolo_model,BOX_TRESHOLD=box_threshold,output_coord_in_ratio=True,ocr_bbox=ocr_bbox,draw_bbox_config=draw_bbox_config,caption_model_processor=caption_model_processor,ocr_text=text,iou_threshold=iou_threshold,imgsz=imgsz)image = Image.open(io.BytesIO(base64.b64decode(dino_labled_img)))print('finish processing')parsed_content_list = '\n'.join([f'icon {i}: ' + str(v) for i, v in enumerate(parsed_content_list)])return image, str(parsed_content_list)@app.route('/process_image', methods=['POST'])
def process_image():try:# 獲取圖像數據file = request.files['image']image = Image.open(file.stream)# 獲取參數box_threshold = float(request.form.get('box_threshold', 0.05))iou_threshold = float(request.form.get('iou_threshold', 0.1))use_paddleocr = bool(request.form.get('use_paddleocr', True))imgsz = int(request.form.get('imgsz', 640))# 處理圖像processed_image, parsed_content = process(image, box_threshold, iou_threshold, use_paddleocr, imgsz)# 將處理后的圖像轉換為 base64 編碼buffered = io.BytesIO()processed_image.save(buffered, format="PNG")img_str = base64.b64encode(buffered.getvalue()).decode()# 返回結果return jsonify({'image': img_str,'parsed_content': parsed_content})except Exception as e:return jsonify({'error': str(e)}), 500if __name__ == '__main__':app.run(host='0.0.0.0', port=7861)
可以運行測試代碼:
curl -X POST \-F "image=@path/to/your/image.jpg" \-F "box_threshold=0.05" \-F "iou_threshold=0.1" \-F "use_paddleocr=true" \-F "imgsz=640" \http://localhost:7861/process_image
(2)創建 dockerfile (放在 omniparser 文件夾下)
# 使用 Python 3.12 作為基礎鏡像
FROM python:3.12-slim# 設置工作目錄
WORKDIR /app# 安裝系統依賴
RUN apt-get update && apt-get install -y \git \curl \wget \unzip \&& rm -rf /var/lib/apt/lists/*# 復制項目文件
COPY . /app# 安裝 Python 依賴
RUN pip install --no-cache-dir -r requirements.txt# 解壓權重文件(如果需要)
RUN if [ -f "omniparse_weights.zip" ]; then unzip omniparse_weights.zip -d weights; fi# 暴露應用端口(根據實際應用修改)
EXPOSE 7861 # 設置環境變量(根據需要添加)
ENV PYTHONPATH="/app:$PYTHONPATH"# 定義啟動命令(根據實際應用修改)
CMD ["python", "flask_demo.py"]
(3)運行docker可能需要設置鏡像
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{"registry-mirrors": ["https://docker.xuanyuan.me/"]
}
EOF
重啟 Docker 服務:
sudo systemctl daemon-reload
sudo systemctl restart docker
測試一下:
docker run hello-world
(4)構建 docker 鏡像
# 構建鏡像
docker build -t omniparser:latest .# 運行容器(前臺模式)
docker run -it --rm -p 7861:7861 -p 5000:5000 omniparser:latest# 或使用后臺模式
docker run -d -p 7861:7861 --name omniparser omniparser:latest