ui-tars和omni-parser使用

ui-tars部署和訓練

說明
快速開始
- 環境準備
- ui-tars web推理和訓練
- ui-tars api部署
- omni-parser使用

說明

鏡像中包含ui-tars、llama-factory和omni-parser。該鏡像還在審批中，估計明天可以上線，到時候可以在auto-dl中的社區鏡像搜索。

快速開始

使用auto-dl鏡像：
在這里插入圖片描述

https://www.codewithgpu.com/i/hiyouga/LLaMA-Factory/ui-tars_omni-parser_llama-factory

環境準備

將模型從系統盤移動到數據盤，移動成功后可以選擇刪除原文件

cp -r /root/model/UI-TARS-7B-DPO /root/autodl-tmp/
cp -r /root/omni  /root/autodl-tmp/

ui-tars web推理和訓練

bash /root/LLaMA-Factory/chuli/one.sh

高級設置的提示模板要改成qwen2_vl，否則無法上傳圖片
在這里插入圖片描述
具體的使用方法可以查看llama-factory官方
https://github.com/hiyouga/LLaMA-Factory

ui-tars api部署

進入conda環境

conda activate llama

-tp 是指需要的gpu數量，改成1

python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars \--model /root/autodl-tmp/UI-TARS-7B-DPO --limit-mm-per-prompt image=5 --dtype=half -tp 1

使用自定義服務進行映射，方便本地電腦調用：

ssh -CNg -L 8000:127.0.0.1:8000 root@region-9.autodl.pro -p 46525

本地電腦調用示例：

curl http://localhost:8000/v1/chat/completions \-H "Content-Type: application/json" \-d '{"model": "ui-tars","messages": [{"role": "user", "content": "我想問你，5的階乘是多少？<think>\n"}]}'
{"id":"chat-7c8149f008a24adfa451a989ba6256d5","object":"chat.completion","created":1741314705,"model":"ui-tars","choices":[{"index":0,"message":{"role":"assistant",
"content":"5的階乘是120。階乘運算的數學符號是“!”。在計算機編程語言中，它通常用“ fact”來表示。階乘的定義為：n! = n * (n - 1) * (n - 2) * ... * 2 * 1，其中n是一個正整數。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":22,"total_tokens":97,"completion_tokens":75},"prompt_logprobs":null}%

test.py

from model import OpenAIModel, print_with_colorconfigs = {"DEEPSEEK_API_BASE": "http://localhost:8000/v1/chat/completions","DEEPSEEK_API_MODEL": "ui-tars","MAX_TOKENS": 1024,"TEMPERATURE": 0,"OPENAI_API_KEY": ''
}def ask(question: str):print_with_color("####################deepseek####################", "magenta")print_with_color(f"question: {question}", 'yellow')mllm = OpenAIModel(base_url=configs["DEEPSEEK_API_BASE"],api_key=configs["OPENAI_API_KEY"],model=configs["DEEPSEEK_API_MODEL"],temperature=configs["TEMPERATURE"],max_tokens=configs["MAX_TOKENS"],disable_proxies=True)prompt = questionimages = ['image1.jpg']status, rsp = mllm.get_model_response(prompt, images=images)if not status:print_with_color(f"失敗，{rsp}", 'red')returnprint_with_color(f"*********************** rsp:\n{rsp}", "yellow")ask("解釋下圖片的內容")

model.py

from abc import abstractmethod
from typing import List
import base64
import requests
import sys
from typing import Tuple
from colorama import Fore, Styledef encode_image(image_path):with open(image_path, "rb") as image_file:return base64.b64encode(image_file.read()).decode('utf-8')
def print_with_color(text: str, color=""):if color == "red":print(Fore.RED + text)elif color == "green":print(Fore.GREEN + text)elif color == "yellow":print(Fore.YELLOW + text)elif color == "blue":print(Fore.BLUE + text)elif color == "magenta":print(Fore.MAGENTA + text)elif color == "cyan":print(Fore.CYAN + text)elif color == "white":print(Fore.WHITE + text)elif color == "black":print(Fore.BLACK + text)else:print(text)print(Style.RESET_ALL)class BaseModel:def __init__(self):pass@abstractmethoddef get_model_response(self, prompt: str, images: List[str]) -> Tuple[bool, str]:passclass OpenAIModel(BaseModel):def __init__(self, base_url: str, api_key: str, model: str, temperature: float, max_tokens: int, disable_proxies=False):super().__init__()self.base_url = base_urlself.api_key = api_keyself.model = modelself.temperature = temperatureself.max_tokens = max_tokensself.disable_proxies = disable_proxiesdef get_model_response(self, prompt: str, images: List[str]=[], tools: list[dict]=None,history: list[dict]=None, role: str="user") -> Tuple[bool, str]:content = [{"type": "text","text": prompt}]for img in images:base64_img = encode_image(img)content.append({"type": "image_url","image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}})headers = {"Content-Type": "application/json","Authorization": f"Bearer {self.api_key}"}payload = {"model": self.model,"messages": [{"role": role,"content": content}],"temperature": self.temperature,"max_tokens": self.max_tokens}if tools:payload["tools"] = toolsif history:history.append(payload['messages'][-1])payload['messages'] = historyif self.disable_proxies:response = requests.post(self.base_url, headers=headers, json=payload, proxies={}).json()else:response = requests.post(self.base_url, headers=headers, json=payload).json()if "error" not in str(response):if not 'usage' in response:print_with_color(f"not usage:{response}", 'res')else:usage = response["usage"]prompt_tokens = usage["prompt_tokens"]total_tokens = usage["total_tokens"]completion_tokens = usage["completion_tokens"]print_with_color(f"total_tokens: {total_tokens}, prompt_tokens: {prompt_tokens}, completion_tokens: {completion_tokens}")completion_tokens = usage["completion_tokens"]if self.model == "gpt-4o":print_with_color(f"Request gpt-4o cost is "f"${'{0:.2f}'.format(prompt_tokens / 1000 * 0.005 + completion_tokens / 1000 * 0.015)}","yellow")else:print_with_color(f"Request cost is "f"${'{0:.2f}'.format(prompt_tokens / 1000 * 0.01 + completion_tokens / 1000 * 0.03)}","yellow")else:print_with_color(f"執行失敗，response: {response}", "red")return False, responseif tools:return True, response["choices"][0]["message"]["tool_calls"]else:return True, response["choices"][0]["message"]["content"]

omni-parser使用

方法1，通過服務器部署
進入omni目錄
進入conda環境
conda activate llama
啟動服務，最好有gpu的
python server.py
本地
通過client.py中的parser方法調用

方法2，本地部署調用
如果本地電腦有比較好的GPU，可以直接調用omni_parser.py里的parser方法。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/75439.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/75439.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/75439.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！