gemma-3n-E2B多模態模型使用案例：支持文本、圖像、語音輸入

參考：
https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
在這里插入圖片描述

下載：
https://modelscope.cn/models/google/gemma-3n-E2B-it 模型下載

運行代碼：
https://github.com/huggingface/huggingface-gemma-recipes

微調：
https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune#fine-tuning-gemma-3n-with-unsloth

代碼

報錯：Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you’re reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=“+dynamo”

解決：
import torch._dynamo
torch._dynamo.config.suppress_errors = True
torch._dynamo.disable()
import os
os.environ["TORCH_COMPILE"] = "0"
os.environ["TORCHDYNAMO_DISABLE"] = "1"
os.environ["DISABLE_TORCH_COMPILE"] = "1"

完整代碼

from transformers import AutoProcessor, AutoModelForImageTextToText
import torchdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")import torch._dynamo
torch._dynamo.config.suppress_errors = True
torch._dynamo.disable()
import os
os.environ["TORCH_COMPILE"] = "0"
os.environ["TORCHDYNAMO_DISABLE"] = "1"
os.environ["DISABLE_TORCH_COMPILE"] = "1"model_id = "./gemma-3n-E2B-it" # google/gemma-3n-e2b-it
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id).to(device)def model_generation(model, messages):inputs = processor.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_dict=True,return_tensors="pt",)input_len = inputs["input_ids"].shape[-1]inputs = inputs.to(model.device, dtype=model.dtype)with torch.inference_mode():generation = model.generate(**inputs, max_new_tokens=32)generation = generation[:, input_len:]decoded = processor.batch_decode(generation, skip_special_tokens=True)print(decoded[0])

文本推理

# Text
messages = [{"role": "user","content": [{"type": "text", "text": "你是誰"}]}
]
model_generation(model, messages)

圖像+文本推理

#   Image Onlymessages = [{"role": "user","content": [{ "type": "image", "image" : "./下載.jpg" },{"type": "text", "text": "詳細描述這張圖片"}]}
]
model_generation(model, messages)

語音+文本推理

# Interleaved with Audiomessages = [{"role": "user","content": [{"type": "text", "text": "Transcribe the following speech segment in English:"},{"type": "audio", "audio": "test-16b-caps.wav"},]}
]
model_generation(model, messages)

在這里插入圖片描述

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/87918.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/87918.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/87918.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！