論文網址:CLIP in medical imaging: A survey - ScienceDirect
項目頁面:github.com
英文是純手打的!論文原文的summarizing and paraphrasing。可能會出現難以避免的拼寫錯誤和語法錯誤,若有發現歡迎評論指正!文章偏向于筆記,謹慎食用
目錄
1. 心得
2. 論文逐段精讀
2.1.?Abstract
2.2.?Introduction
2.3.?Background
2.3.1.?Contrastive language-image pre-training
2.3.2.?Variants of CLIP
2.3.3.?Medical image–text dataset
2.4.?CLIP in medical image–text pre-training
2.4.1.?Challenges of CLIP pre-training
2.4.2.?Multi-scale contrast
2.4.3.?Data-efficient contrast
2.4.4.?Explicit knowledge enhancement
2.4.5.?Others
2.4.6.?Summary
2.5.?CLIP-driven applications
2.5.1.?Classification
2.5.2.?Dense prediction
2.5.3.?Cross-modal tasks
2.5.4.?Summary
2.6.?Comparative analysis
2.7.?Discussions and future directions
2.8.?Conclusion
1. 心得
(1)我這可能只記錄這篇文章比較不同的地方,基礎CLIP和醫學影像就不記錄了,可以參考原文。主要是太長了沒必要全搬運
(2)怎么全文畫圖風格還不一樣,每個人畫一張拼的?
(3)偏記錄一點,介紹了不同的特別多模型
2. 論文逐段精讀
2.1.?Abstract
? ? ? ? ①就說CLIP在醫學成像領域有意義然后要探索一下
2.2.?Introduction
? ? ? ? ①Limitations: poor performance on?out-of-distribution performance
? ? ? ? ②The trend of CLIP relevant papers (left) and medical image contained in thosed papers (right):
? ? ? ? ③How CLIP be used:
2.3.?Background
2.3.1.?Contrastive language-image pre-training
? ? ? ? ①How CLIP works(如果沒看過可以去找CLIP原文,很清晰易懂的):
? ? ? ? ②Performance of CLIP in medical field:
2.3.2.?Variants of CLIP
? ? ? ? ①介紹了一些變體,但因為沒畫圖很難記住或者一眼知道有啥區別
2.3.3.?Medical image–text dataset
? ? ? ? ①Open medical dataset:
2.4.?CLIP in medical image–text pre-training
? ? ? ? ①Representative CILP based medical models:
2.4.1.?Challenges of CLIP pre-training
? ? ? ? ①Challenges of CLIP in medical image field:?
Modality-influenced, local and global image/text analysis needed |
Scarse data(不是說零樣本泛化性都很好了嗎為什么又說數據稀缺 |
Need professional kownledge |
2.4.2.?Multi-scale contrast
? ? ? ? ①GLoRIA matches text with subgraph:
? ? ? ? ②LoVT further assigns different weights on different sentence
2.4.3.?Data-efficient contrast
? ? ? ? ①Blindly push all negative pairs away might reduce the relevance of similar disease:
? ? ? ? ②Add description or shuffle sentences
? ? ? ? ③Using medical image video
2.4.4.?Explicit knowledge enhancement
? ? ? ? ①Combined with graph or kownledge graph(KG):
2.4.5.?Others
? ? ? ? ~
2.4.6.?Summary
? ? ? ? ~
2.5.?CLIP-driven applications
2.5.1.?Classification
? ? ? ? ①CLIP based models on image classification:
(1)Zero-shot classification
? ? ? ? ①Diagnosis example(我靠還能這樣,,做二分類):
? ? ? ? ②How Xplainer works(我靠牛唄啊CLIP現在都醬紫玩的):
(2)Context optimization
? ? ? ? ①Example of context optimization:
這沒什么解釋,不能讓人快速上手啊哈哈
2.5.2.?Dense prediction
? ? ? ? ①Methods:
(1)Detection
? ? ? ? ①Lists relevant models
(2)2D medical image segmentation
? ? ? ? ①fine tune CLIP to 2D medical image dataset
(3)3D medical image segmentation
? ? ? ? ①Examples:
(4)Others
2.5.3.?Cross-modal tasks
? ? ? ? ①Repesentitive models:
(1)Generation
? ? ? ? ①Automatically generate medical report or medical image
(2)Medical visual question answering
? ? ? ? ①Example(這構造奇奇怪怪的):
(3)Image–text retrieval
? ? ? ? ①Current models focus on global image feature
? ? ? ? ②X-TRA:
2.5.4.?Summary
? ? ? ? ~
2.6.?Comparative analysis
? ? ? ? ①How?Multi-modality Large Language Model (MLLM) different from CLIP:
? ? ? ? ②Performance of CLIP on different image sets:
2.7.?Discussions and future directions
? ? ? ? ①Inter-disease similarity:
? ? ? ? ②Challenges:?inconsistency between pre-training and application,?incomprehensive evaluation of refined pre-training,?challenges of volumetric imaging,?limited scope of refined CLIP pre-training,?debiasing in CLIP Models,?enhancing adversarial robustness of CLIP,?exploring the potential of metadata,?incorporation of high-order correlations,?beyond image–text alignment
2.8.?Conclusion
? ? ? ? ~