目錄
開源地址:
模型repo下載:
單圖片demo:
多圖推理demo:
論文學習筆記:
部署完整教程:
微調教程:
部署,微調教程,視頻實測
BitCPM4 技術報告
創意:把量化塞進訓練?
開源地址:
https://github.com/OpenBMB/MiniCPM
openbmb/MiniCPM4-8B-Eagle-vLLM
模型 2.29G
模型repo下載:
modelscope download --model=OpenBMB/MiniCPM-V-2_6 --local_dir ./MiniCPM-V-2_6 ?
單文件GGUF下載:
modelscope download --model=OpenBMB/MiniCPM-V-2_6-gguf --local_dir ./ ggml-model-Q4_K_M.gguf
單圖片demo:
自動下載模型:
C:\Users\xxx\.cache\modelscope\hub\models\OpenBMB\MiniCPM-V-2_6
需要顯存20G,模型文件15G
# test.py
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizermodel = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True,attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True)image = Image.open(r"B:\360MoveData\Users\Administrator\Pictures\liuying\IMG_20150903_123711.jpg").convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': [image, question]}]res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer
)
print(res)## if you want to use streaming, please make sure sampling=True and stream=True
## the model.chat will return a generator
res = model.chat(image=None,msgs=msgs,tokenizer=tokenizer,sampling=True,stream=True
)generated_text = ""
for new_text in res:generated_text += new_textprint(new_text, flush=True, end='')
多圖推理demo:
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True) image1 = Image.open('image1.jpg').convert('RGB')
image2 = Image.open('image2.jpg').convert('RGB')
question = 'Compare image 1 and image 2, tell me about the differences between image 1 and image 2.' msgs = [{'role': 'user', 'content': [image1, image2, question]}] answer = model.chat( image=None, msgs=msgs, tokenizer=tokenizer
)
print(answer)
視頻理解
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokenizer
from decord import VideoReader, cpu # pip install decord params={} model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True) MAX_NUM_FRAMES=64 def encode_video(video_path): def uniform_sample(l, n): gap = len(l) / n idxs = [int(i * gap + gap / 2) for i in range(n)] return [l[i] for i in idxs] vr = VideoReader(video_path, ctx=cpu(0)) sample_fps = round(vr.get_avg_fps() / 1) # FPS frame_idx = [i for i in range(0, len(vr), sample_fps)] if len(frame_idx) > MAX_NUM_FRAMES: frame_idx = uniform_sample(frame_idx, MAX_NUM_FRAMES) frames = vr.get_batch(frame_idx).asnumpy() frames = [Image.fromarray(v.astype('uint8')) for v in frames] print('num frames:', len(frames)) return frames video_path="car.mp4"
frames = encode_video(video_path)
question = "Describe the video"
msgs = [ {'role': 'user', 'content': frames + [question]},
] # Set decode params for video
params["use_image_id"] = False
params["max_slice_nums"] = 2 # 如果cuda OOM且視頻分辨率大于448*448 可設為1 answer = model.chat( image=None, msgs=msgs, tokenizer=tokenizer, **params
)
print(answer)
論文學習筆記:
MiniCPM,能被斯坦福抄襲究竟有何魅力?我們一起看看論文吧!-騰訊云開發者社區-騰訊云
部署完整教程:
MiniCPM-V 2.6:端側最強多模態大模型探索【推理實戰大全】_vllm minicpm-CSDN博客
微調教程:
MiniCPM-o-2.6 多模態大模型微調實戰(完整代碼)_minicpm-o 2.6-CSDN博客
部署,微調教程,視頻實測
多圖、視頻首上端!面壁「小鋼炮」 MiniCPM-V 2.6 模型重磅上新!魔搭推理、微調、部署實戰教程modelscope-CSDN博客
BitCPM4 技術報告
而剛剛提到的43頁技術報告,我看了一遍,覺得可以拆成以下:
InfLLM v2:Attention 層只看重點
FR-Spec:草稿階段不全寫
BitCPM4:訓練時就考慮壓縮
CPM.cu + ArkInfer:定制推理 & 部署系統
風洞 2.0:小模型先試,大模型再訓
創意:把量化塞進訓練?
MiniCPM 4.0 技術報告:端側速度的奔涌,是模型的自我Rag | 人人都是產品經理