2025年8月14日 Meta 發布了 DINOv3 。
主頁:https://ai.meta.com/dinov3/
論文:DINOv3
HuggingFace地址:https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009
官方博客:https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
代碼:https://github.com/facebookresearch/dinov3
如上圖所示,高分辨率密集特征,作者將使用DINOv3輸出特征獲得的帶有紅叉標記的塊與所有其他塊之間的余弦相似性進行可視化。
DINOv3的發布標志著在大規模自監督學習(SSL)訓練方面取得了突破,且展示了一個單一凍結的自監督學習主干網絡可以作為通用視覺編碼器。
DINOv3 如何使用呢?官方提供了兩種方式,很簡單。
from transformers import pipeline
from transformers.image_utils import load_imageurl = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
image = load_image(url)feature_extractor = pipeline(model="facebook/dinov3-vitb16-pretrain-lvd1689m",task="image-feature-extraction",
)
features = feature_extractor(image)
import torch
from transformers import AutoImageProcessor, AutoModel
from transformers.image_utils import load_imageurl = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = load_image(url)pretrained_model_name = "facebook/dinov3-vitb16-pretrain-lvd1689m"
processor = AutoImageProcessor.from_pretrained(pretrained_model_name)
model = AutoModel.from_pretrained(pretrained_model_name, device_map="auto",
)inputs = processor(images=image, return_tensors="pt").to(model.device)
with torch.inference_mode():outputs = model(**inputs)pooled_output = outputs.pooler_output
print("Pooled output shape:", pooled_output.shape)
如果你要將模型下載到本地,可以直接加載模型所在目錄,如下所示:
import torch
from transformers import AutoImageProcessor, AutoModel
from transformers.image_utils import load_imageurl = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = load_image(url)pretrained_model_name = "../model/dinov3-vitl16-pretrain-lvd1689m" # 本地large模型
processor = AutoImageProcessor.from_pretrained(pretrained_model_name, use_safetensors=True)
model = AutoModel.from_pretrained(pretrained_model_name, device_map="auto",
)inputs = processor(images=image, return_tensors="pt").to(model.device)
with torch.inference_mode():outputs = model(**inputs)
pooled_output = outputs.pooler_output
print("Pooled output shape:", pooled_output.shape)
輸出結果為:
Pooled output shape: torch.Size([1, 1024])
pipeline()方法也是一樣的方法。
關于模型輸出特征的維度:
正如我上面的輸出結果:large模型->1024dim
待更新...