計算機視覺頂刊《International Journal of Computer Vision》2025年5月前沿熱點可視化分析

追蹤計算機視覺領域的前沿熱點是把握技術發展方向、推動創新落地的關鍵，分析這些熱點，不僅能洞察技術趨勢，更能為科研選題和工程實踐提供重要參考。本文對計算機視覺頂刊《International Journal of Computer Vision》2025年5月前沿熱點進行了可視化分析。歡迎閱讀和轉發。

本文作者為韓煦，審核為鄧鏑。

一、期刊介紹

《國際計算機視覺雜志》（International Journal of Computer Vision，簡稱IJCV）是計算機視覺領域的頂級期刊。該期刊現為月刊（每年出版12期），致力于發表高質量、原創性的學術論文，以推動計算機視覺科學與工程的蓬勃發展。期刊影響因子11.6（2023），5年期刊影響因子14.5（2023），提交首次決定（中位數）96天。表1展示了IJCV近5年發表文章的數量及期刊的影響因子（IF）的變化情況。

表 1?IJCV每年的文章數量和影響因子

年度	文章數/年	IF
2023	198	11.6
2022	187	19.5
2021	130	13.3
2020	187	7.4
2019	90	5.7

該期刊的討論主題領域主要聚焦于計算機視覺領域，具體來說包括圖像形成、處理、分析與解讀、機器學習技術、統計方法；傳感器技術；基于圖像的渲染、計算機圖形學、機器人技術、影像解譯、圖像檢索、視頻分析與標注、多媒體等；視覺計算模型及人腦視覺架構研究。

期刊網址：https://link.springer.com/journal/11263

二、熱點分析

表2?論文標題中出現的高頻主題詞

高頻主題	翻譯	出現次數	核心方向
Generation	生成	8	故事 / 圖像 / 視頻生成
Consistency	一致性	6	多視圖、跨模態、角色身份一致性
Re-identification	重識別	4	行人 / 視頻重識別
Semantic Segmentation	語義分割	4	弱監督 / 跨模態 / 醫學場景
Diffusion Models	擴散模型	3	動態跟蹤、長視頻生成
3D Reconstruction	3D 重建	3	神經場景、形狀表示
Self-Supervised	自監督學習	3	無監督 / 少監督復雜任務
Multi-modal	多模態	3	視覺 - 語言、跨模態蒸餾
Medical Image	醫學影像	2	分割、腫瘤預測
Adversarial Learning	對抗學習	2	質量評估、攻擊防御
Multi-view	多視圖	2	SLIDE（多視圖一致性）、多視圖立體網絡（深度估計）
Unsupervised	無監督	2	跨模態蒸餾語義分割
Semi-supervised	半監督	2	醫學影像分割、聯邦半監督學習
DeepFake Detection	DeepFake 檢測	2	魯棒序列檢測、雙級適配器檢測
Cross-Modal	跨模態	2	跨模態蒸餾

圖 1?研究熱點詞云圖

表2列出了在本次會議中，被錄用的38篇論文標題中的15個高頻主題詞。圖1展示了基于IJCV研究熱點生成的詞云圖，涵蓋語義分割、擴散模型、一致性等研究領域。表3總結了本期IJCV的已被接受的投稿論文。

表3?2025年5月IJCV發表論文的列表

題目	中文翻譯
AutoStory: Generating Diverse Storytelling Images with Minimal Human Efforts	AutoStory：以最小人力生成多樣化故事圖像
SLIDE: A Unified Mesh and Texture Generation Framework with Enhanced Geometric Control and Multi-view Consistency	SLIDE：具有增強幾何控制與多視角一致性的統一網格與紋理生成框架
Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID	探索同質與異質一致標簽關聯的無監督可見光–紅外行人重識別
AniClipart: Clipart Animation with Text-to-Video Priors	AniClipart：基于文本到視頻先驗的剪貼畫動畫
Combating Label Noise with a General Surrogate Model for Sample Selection	使用通用替代模型進行樣本選擇以對抗標簽噪聲
CSFRNet: Integrating Clothing Status Awareness for Long-Term Person Re-identification	CSFRNet：融合服裝狀態感知的長時跨度行人重識別網絡
Pseudo-Plane Regularized Signed Distance Field for Neural Indoor Scene Reconstruction	偽平面正則化簽名距離場用于神經室內場景重建
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization	RepSNet：基于邊界回歸與結構重參數化的細胞核實例分割模型
Blind Image Quality Assessment: Exploring Content Fidelity Perceptibility via Quality Adversarial Learning	盲圖像質量評估：通過質量對抗學習探索內容保真性感知
HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning	HUPE：基于語義協同學習的啟發式水下感知增強
Robust Sequential DeepFake Detection	強健序列化 DeepFake 檢測
PICK: Predict and Mask for Semi-supervised Medical Image Segmentation	PICK：用于半監督醫學圖像分割的預測與掩碼方法
Relation-Guided Versatile Regularization for Federated Semi-Supervised Learning	基于關系引導的聯邦半監督學習通用正則化
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling	通用類平衡多中心動態原型偽標簽
Diving Deep into Simplicity Bias for Long-Tailed Image Recognition	深入探討長尾圖像識別中的簡單性偏差
Context-Aware Multi-view Stereo Network for Efficient Edge-Preserving Depth Estimation	面向高效邊緣保留深度估計的上下文感知多視角立體網絡
LDTrack: Dynamic People Tracking by Service Robots Using Diffusion Models	LDTrack：服務機器人基于擴散模型的動態人群跟蹤
Learning Meshing from Delaunay Triangulation for 3D Shape Representation	從 Delaunay 三角化學習網格以進行三維形狀表示
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond	RIGID：真實人臉視頻的循環 GAN 反演與編輯
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation	UniCanvas：通過定制文本到圖像生成功能感知的統一真實圖像編輯
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast	通過語言前景感知對比的通用機器人視覺-語言模型
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective	從進化博弈論視角重新思考自監督學習的泛化性與判別性
Pre-trained Trojan Attacks for Visual Recognition	預訓練木馬攻擊用于視覺識別
GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection	GL-MCM：用于零樣本分布外檢測的全局與局部最大概念匹配
A Mutual Supervision Framework for Referring Expression Segmentation and Generation	一種用于指代表達式分割與生成的互監督框架
DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection	DeepFake-Adapter：用于 DeepFake 檢測的雙層適配器
MoonShot: Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions	MoonShot：面向可控視頻生成與編輯的運動感知多模態條件
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition	SeaFormer++：用于移動視覺識別的壓縮增強軸向 Transformer
Dual-Space Video Person Re-identification	雙空間視頻行人重識別
Image Synthesis Under Limited Data: A Survey and Taxonomy	有限數據條件下的圖像合成：調查與分類
Sample-Cohesive Pose-Aware Contrastive Facial Representation Learning	基于樣本內聚性與姿態感知的對比人臉表征學習
Learning with Enriched Inductive Biases for Vision-Language Models	面向視覺-語言模型的富歸納偏置學習
Self-supervised Shutter Unrolling with Events	基于事件的自監督快門反展開
TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On	TryOn-Adapter：用于高保真虛擬試穿的高效細粒度服裝身份適配
Correction: CMAE-3D: Contrastive Masked AutoEncoders for Self-Supervised 3D Object Detection	勘誤：CMAE-3D：用于自監督三維目標檢測的對比掩碼自編碼器
Correction: Deep Attention Learning for Pre-operative Lymph Node Metastasis Prediction in Pancreatic Cancer via Multi-object Relationship Modeling	勘誤：基于多目標關系建模的胰腺癌術前淋巴結轉移預測深度注意力學習
Correction: Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes	勘誤：基于少量標注像素與點云的駕駛場景弱監督語義分割

投稿的論文主題反映出本期研究熱點集中在一下幾個方向：

圖像/視頻生成與編輯：包括故事圖像生成（AutoStory）、文本到視頻/圖像生成（AniClipart、UniCanvas、MoonShot）、Diffusion Models 驅動的生成與編輯（LDTrack、RIGID）等。這一方向兼顧“多模態條件下的內容創生”和“運動感知的可控編輯”兩大主題。
?一致性建模與行人重識別：涉及多視角一致性（SLIDE）、可見-紅外一致標簽關聯（Unsupervised Visible-Infrared Person ReID）、長時序狀態感知重識別（CSFRNet）等。關注場景中跨視角、跨模態的一致性約束與特征對齊技術。
語義分割與三維重構：包括神經簽名距離場重建（Pseudo-Plane Regularized SDF）、Delaunay三角網格重建（Learning Meshing from Delaunay Triangulation）、核實例分割（RepSNet）、弱監督/半監督分割（PICK、Few Annotated Pixels）等。兼顧平面、體素、點云等多種三維表示與精細分割任務。