vlm MiniCPM 學習部署實戰

開源地址：

模型repo下載：

單圖片demo：

多圖推理demo：

論文學習筆記：

部署完整教程：

微調教程：

部署，微調教程，視頻實測

BitCPM4 技術報告

創意：把量化塞進訓練?

開源地址：

https://github.com/OpenBMB/MiniCPM

openbmb/MiniCPM4-8B-Eagle-vLLM

模型 2.29G

模型repo下載：

modelscope download --model=OpenBMB/MiniCPM-V-2_6 --local_dir ./MiniCPM-V-2_6 ?

單文件GGUF下載：

modelscope download --model=OpenBMB/MiniCPM-V-2_6-gguf --local_dir ./ ggml-model-Q4_K_M.gguf

單圖片demo：

自動下載模型：

C:\Users\xxx\.cache\modelscope\hub\models\OpenBMB\MiniCPM-V-2_6

需要顯存20G，模型文件15G

# test.py
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizermodel = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True,attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True)image = Image.open(r"B:\360MoveData\Users\Administrator\Pictures\liuying\IMG_20150903_123711.jpg").convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer
)
print(res)## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer,sampling=True,stream=True
)generated_text = ""
for new_text in res:generated_text += new_textprint(new_text, flush=True, end='')

多圖推理demo：

import torch  
from PIL import Image  
from modelscope import AutoModel, AutoTokenizer  model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True,  attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager  
model = model.eval().cuda()  
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True)  image1 = Image.open('image1.jpg').convert('RGB')  
image2 = Image.open('image2.jpg').convert('RGB')  
question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.'  msgs = [{'role': 'user', 'content': [image1, image2, question]}]  answer = model.chat(  image=None,  msgs=msgs,  tokenizer=tokenizer  
)  
print(answer)

視頻理解

import torch  
from PIL import Image  
from modelscope import AutoModel, AutoTokenizer  
from decord import VideoReader, cpu    # pip install decord  params={}  model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True,  attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager  
model = model.eval().cuda()  
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True)  MAX_NUM_FRAMES=64  def encode_video(video_path):  def uniform_sample(l, n):  gap = len(l) / n  idxs = [int(i * gap + gap / 2) for i in range(n)]  return [l[i] for i in idxs]  vr = VideoReader(video_path, ctx=cpu(0))  sample_fps = round(vr.get_avg_fps() / 1)  # FPS  frame_idx = [i for i in range(0, len(vr), sample_fps)]  if len(frame_idx) > MAX_NUM_FRAMES:  frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES)  frames = vr.get_batch(frame_idx).asnumpy()  frames = [Image.fromarray(v.astype('uint8')) for v in frames]  print('num frames:', len(frames))  return frames  video_path="car.mp4"  
frames = encode_video(video_path)  
question = "Describe the video"  
msgs = [  {'role': 'user', 'content': frames + [question]},   
]  # Set decode params for video  
params["use_image_id"] = False  
params["max_slice_nums"] = 2 # 如果cuda OOM且視頻分辨率大于448*448 可設為1  answer = model.chat(  image=None,  msgs=msgs,  tokenizer=tokenizer,  **params  
)  
print(answer)

論文學習筆記：

MiniCPM，能被斯坦福抄襲究竟有何魅力？我們一起看看論文吧！-騰訊云開發者社區-騰訊云

部署完整教程：

MiniCPM-V 2.6：端側最強多模態大模型探索【推理實戰大全】_vllm minicpm-CSDN博客

微調教程：

MiniCPM-o-2.6 多模態大模型微調實戰（完整代碼）_minicpm-o 2.6-CSDN博客

部署，微調教程，視頻實測

多圖、視頻首上端！面壁「小鋼炮」 MiniCPM-V 2.6 模型重磅上新！魔搭推理、微調、部署實戰教程modelscope-CSDN博客

BitCPM4 技術報告

而剛剛提到的43頁技術報告，我看了一遍，覺得可以拆成以下：

InfLLM v2：Attention 層只看重點
FR-Spec：草稿階段不全寫
BitCPM4：訓練時就考慮壓縮
CPM.cu + ArkInfer：定制推理 & 部署系統
風洞 2.0：小模型先試，大模型再訓

創意：把量化塞進訓練?

MiniCPM 4.0 技術報告：端側速度的奔涌，是模型的自我Rag | 人人都是產品經理

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/90453.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/90453.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/90453.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！