LLM學習：大模型基礎——視覺大模型以及autodl使用

1、常見的VLM

在大模型中，VLM 是視覺語言模型（Vision-Language Model）的縮寫，是一種多模態、生成式 AI 模型，能夠理解和處理視頻、圖像和文本。
VLM 通過將大語言模型（LLM）與視覺編碼器相結合構建而成，使 LLM 具有 “看” 的能力，從而可以處理并提供對提示中的視頻、圖像和文本輸入的高級理解，以生成文本響應。與傳統的計算機視覺模型不同，VLM 不受固定類別集或特定任務約束，在大量文本和圖像 / 視頻字幕對的語料上進行重新訓練后，它可以用自然語言進行指導，用于處理許多典型的視覺任務以及新的生成式 AI 任務，例如摘要和視覺問答。

常見的VLM有以下幾個：

        GPT-4V：屬于分析型 VLM，是 OpenAI 開發的強大視覺語言模型，能夠理解和處理圖像與文本的組合輸入，并生成文本響應，在視覺問答、圖像描述等多種任務上表現出色。
        Qwen2.5-VL：是阿里云的旗艦視覺語言模型，有 30 億、70 億和 720 億參數三種規模，使用 ViT 視覺編碼器和 Qwen 2.5 LLM，它可以理解長度為一個小時以上的視頻，并可以瀏覽桌面和智能手機界面。
        Claude 4：也是分析型 VLM 的代表之一，由 Anthropic 公司開發，具備強大的語言理解和生成能力，同時在處理視覺相關任務時也有很好的表現，能夠準確回答關于圖像內容的問題等。

2、qwen-VL圖像理解實例

通過qwen-VL讀取幾張圖片，提示詞和圖片從excel中讀取，將最終的結果也輸出到excel中。

import os
import dashscope
from dashscope.api_entities.dashscope_response import Role
from dashscope import MultiModalConversation
import pandas as pd
dashscope.api_key = os.getenv('DASHSCOPE_API_KEY')absolute_path = os.path.dirname(os.path.abspath(__file__))
def get_response(user_prompt, image_url):# 得到messageslocal_file_path = f'file://{absolute_path}\\{image_url}.jpg'messages = [{'role': 'system','content': [{'text': 'You are a helpful assistant.'}]}, {'role':'user','content': [{'image': f'{local_file_path}'},{'text': f'{user_prompt}.'},]}]print(messages)completion = MultiModalConversation.call(model='qwen-vl-plus', messages=messages)# 檢查API調用是否成功if completion is None:print("API調用返回None，可能請求失敗或網絡問題")return "錯誤：API調用失敗，返回None"if completion.status_code != 200:print(f"API調用失敗: {completion.status_code}, {completion.message}")return f"錯誤: {completion.message}"# 正確處理響應try:response = completion.output.choices[0]['message']['content'][0]['text']print(f'response={response}')return responseexcept Exception as e:print(f"解析響應時出錯: {e}")return f"錯誤：無法解析響應，{str(e)}"df = pd.read_excel(f'{absolute_path}\\prompt_template_cn.xlsx')
df['response'] = ''
for index, row in df.iterrows():user_prompt = row['prompt']image_url = row['image']print(f"user_prompt:{user_prompt}")print(f"image_url:{image_url}")# 得到VLM推理結果result = get_response(user_prompt, image_url)# 檢查返回結果是否為錯誤信息if isinstance(result, str) and result.startswith("錯誤"):response = resultelse:# 如果不是錯誤信息，則嘗試提取響應內容try:response = resultexcept Exception as e:response = f"處理響應時出錯: {str(e)}"print(f"response:{response}")df.loc[index, 'response'] = response#print(f"{index+1} {user_prompt} {image_url}")
df

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/96076.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/96076.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/96076.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！