Title
題目
End-to-end breast cancer radiotherapy planning via LMMs with consistency ?embedding
通過具有一致性嵌入的大語言模型實現端到端乳腺癌放射治療計劃制定
01
文獻速遞介紹
近年來,受大型語言模型(LLM)啟發的新一代人工智能模型(即基礎模型)的出現,標志著其與以往范式存在顯著差異(Moor 等人,2023)。這些模型具有規模龐大、功能多樣的特點,這源于它們在多樣化數據上進行的自監督訓練。目前,基礎模型已能在多個領域實現最先進(SOTA)的性能,包括多模態推理、圖文生成、圖像 captioning 以及文本引導的圖像分割等任務(Bubeck 等人,2023;Dai 等人,2024;Driess 等人,2023;Li 等人,2023b;Liu 等人,2024;Lai 等人,2024)。這些特性意味著人工智能融入醫療實踐的方式可能發生范式轉變——醫療實踐本質上依賴多模態信息來制定全面的臨床決策。此外,這也為克服目前 500 多種 FDA 批準的人工智能模型的局限性提供了機會,這些模型大多僅針對特定任務,且依賴單模態信息(Joshi 等人,2024)。具體而言,與這些單模態人工智能不同,結合基礎模型的通用醫療人工智能能夠全面理解臨床工作流程,可處理多種醫療數據,包括影像模態、電子健康記錄、實驗室結果、基因組學甚至臨床報告(Singhal 等人,2023;Rajpurkar 和 Lungren,2023;Wu 等人,2023b;Moor 等人,2023;Tu 等人,2024)。通過理解各類數據及其相互關系,多模態人工智能能提供患者數據的全面視圖,從而促進更準確的診斷、個性化治療方案的制定,并減少醫療差錯。 ?### 2. ? 本文聚焦放射腫瘤學領域,該領域中多模態整合至關重要,使其成為評估基礎模型潛力的最重要臨床領域之一。因此,我們在此介紹 RO-LMM——一種專為支持放射腫瘤學臨床工作流程設計的大型多模態模型(LMM)原型。具體而言,本研究顯著擴展了我們先前的相關工作 LLMSeg(Oh 等人,2024),后者側重于多模態分割。更具體地說,RO-LMM 通過處理放射腫瘤學中更廣泛的臨床任務,擴大了 LLMSeg 的應用范圍:(1)它能將海量患者病史和檢查結果高效總結為簡潔且信息豐富的臨床記錄;此外,它還能夠(2)從臨床專家視角提出合適的放射治療策略,以及(3)在三維(3D)計算機斷層掃描(CT)圖像上勾勒出與所提放射治療策略一致的放射靶區。RO-LMM 的這種多方面功能,在支持臨床專業人員的專業工作方面展現出顯著進展。 在訓練 LLM 執行從放射治療策略建議到靶區分割的一系列連續任務時,我們發現每個任務都存在誤差累積的可能性,這可能導致端到端性能的顯著下降。因此,本研究的另一重要貢獻是借鑒并擴展了“噪聲嵌入微調(NEFTune)”技術(Jain 等人,2024)——該技術在每個目標任務的訓練過程中,向嵌入中注入均勻噪聲。更具體地說,為進一步增強模型的適用性,我們開發了一種新穎的“一致性嵌入微調(CEFTune)”技術,通過添加正則化損失來確保模型對含噪輸入和干凈輸入的預測一致性。此外,通過擴展到文本相關任務之外,我們將這些概念應用于 3D 分割任務,形成了新穎的“噪聲嵌入分割(NESEG)”和“一致性嵌入分割(CESEG)”技術。這些進展防止了后續任務之間的誤差傳播,共同顯著提升了端到端模型在內部和外部驗證中的泛化能力。 作為概念驗證研究,我們的 RO-LMM 框架被用于乳腺癌研究——乳腺癌是一種高發癌癥,其放射治療相對標準化,且僅需基于 CT 影像即可進行。 我們的貢獻總結如下: ? - 提出了一個全面的框架(稱為 RO-LMM),其中 LMM 為乳腺癌放射治療的廣泛工作流程提供支持。據我們所知,該原型是首個支持放射腫瘤學全面工作流程的模型。 ? - 為防止在連續臨床任務(如臨床背景總結、放射治療策略建議和基于計劃的靶區分割)中出現潛在的誤差累積,我們探索了噪聲增強和一致性方法,并提出了新穎的訓練方法(如 CEFTune、NESEG 和 CESEG),顯著增強了方法的穩健性。 ? - 通過在乳腺癌患者真實臨床數據的多種驗證場景下進行實驗,我們證明了 RO-LMM 的性能優于傳統方法
Abatract
摘要
Recent advances in AI foundation models have significant potential for lightening the clinical workload by ?mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of ?radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of ?foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensive ?large multimodal model (LMM) tailored for the field of radiation oncology. This model effectively manages ?a series of tasks within the clinical workflow, including clinical context summarization, radiotherapy strategy ?suggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, to ?perform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding FineTuning (CEFTune) technique, which boosts LMM’s robustness to noisy inputs while preserving the consistency ?of handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to a ?novel Consistency Embedding Segmentation (CESEG) techniques. Experimental results including multi-center ?validation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multiple ?clinical tasks with generalization capabilities.
人工智能基礎模型的最新進展具有巨大潛力,它能模仿醫療專業人員所采用的全面、多層面方法,從而減輕臨床工作負擔。在放射腫瘤學領域,多種模態的整合至關重要,因此基礎模型的應用前景十分廣闊。受此啟發,我們提出了RO-LMM——一種專為放射腫瘤學領域設計的多用途、綜合性大型多模態模型(LMM)。該模型借助大型多模態模型的能力,可有效處理臨床工作流中的一系列任務,包括臨床背景總結、放射治療策略建議以及基于計劃的靶區分割。 特別值得一提的是,為了在執行連續臨床任務時避免誤差累積,我們提出了一種新穎的一致性嵌入微調(CEFTune)技術,該技術能增強大型多模態模型對含噪輸入的穩健性,同時保持處理干凈輸入時的一致性。我們進一步將這一概念擴展到由大型多模態模型驅動的分割框架中,形成了一種新穎的一致性嵌入分割(CESEG)技術。包括多中心驗證在內的實驗結果證實,我們結合了CEFTune和CESEG的RO-LMM在多項臨床任務中表現出色,且具有良好的泛化能力。
Method
方法
In this section, we provide a detailed description of our proposed ?approach designed for sequential text generation tasks, including summarization and suggestions, as well as text-driven image segmentation, ?whose robustness is improved by consistency embedding finetuning. ?The overall framework is illustrated in Fig. 2. 3.1. Consistency embedding fine-tuning for clinical LMM To realize the multi-purpose LMM with expertise in clinical report ?summarization and radiotherapy strategy suggestion, we conduct instruction fine-tuning for LLaMA2 (Touvron et al., 2023). Considering ?the nuanced differences in the intended objective of each task, we adopt ?separate training strategies to acquire task-specific expertise, namely ?RO-LMM-S (summary expert) and RO-LMM-P (plan expert). Specifically, we train a summary expert using collected raw clinical ?report and summary notes. During inference, the summary expert ?receives raw clinical reports, just like in the training scenario. However, ?for the plan expert, there is a discrepancy between the training and ?inference scenarios. In other words, we use the training set made up of ?collected summary notes instead of generated notes from the summary ?expert, mainly due to cost concerns and the inherent nature of our ?framework, which generates output sequentially as illustrated in Fig. ?1. However, at the inference phase, our model takes the generated ?notes from the trained summary expert. To deal with the input domain ?differences for the training and inference time, Noisy Embedding FineTuning (NEFTune) (Jain et al., 2024) which inject uniform noise into ?embedding could be an effective naive solution to handle noisy inputs ?in this task. However, a crucial consideration arises from the nature ?of the generated notes, since some of them may lie closer to clean ?inputs (collected notes) and the others deviate towards noisy inputs. To ?address this, it is essential to train the model to handle both clean and ?noisy inputs. To preserve the robustness facilitated by NEFTune while ?enforcing consistency between the prediction given clean and noisy ?inputs, we introduce Consistency Embedding Fine-Tuning (CEFTune), ?resulting in RO-LMM-P++. 2 More details are as follows
在本節中,我們詳細描述所提出的方法,該方法適用于序列文本生成任務(包括總結和建議)以及文本驅動的圖像分割,其魯棒性通過一致性嵌入微調得到提升。整體框架如圖2所示。 3.1. 用于臨床LMM的一致性嵌入微調 為實現具備臨床報告總結和放射治療策略建議專業能力的多用途LMM,我們對LLaMA2(Touvron等人,2023)進行指令微調。考慮到每項任務的預期目標存在細微差異,我們采用單獨的訓練策略來獲取特定任務的專業知識,即RO-LMM-S(總結專家)和RO-LMM-P(計劃專家)。 具體而言,我們使用收集到的原始臨床報告和總結筆記來訓練總結專家。在推理階段,總結專家接收原始臨床報告,與訓練場景一致。然而,對于計劃專家,訓練場景和推理場景存在差異。也就是說,我們使用由收集到的總結筆記組成的訓練集,而非來自總結專家生成的筆記,這主要是出于成本考慮以及我們框架的固有特性(如 圖1所示,該框架會按順序生成輸出)。但在推理階段,我們的模型會接收來自訓練好的總結專家生成的筆記。 為處理訓練和推理時的輸入領域差異,在嵌入中注入均勻噪聲的噪聲嵌入微調(NEFTune)(Jain等人,2024)可能是處理該任務中噪聲輸入的一種有效的簡單解決方案。然而,生成筆記的性質帶來了一個關鍵問題,因為其中一些筆記可能更接近干凈輸入(收集到的筆記),而另一些則更偏向噪聲輸入。為解決這一問題,訓練模型以同時處理干凈輸入和噪聲輸入至關重要。為在保持NEFTune所帶來的魯棒性的同時,確保對干凈輸入和噪聲輸入的預測一致性,我們引入了一致性嵌入微調(CEFTune),從而得到RO-LMM-P++。更多細節如下。 注:2 “+”、“++”分別表示采用NEFTune和CEFTune。
Conclusion
結論
In this work, we introduce RO-LMM, a multi-purpose, comprehensive foundation model tailored for radiation oncology. Addressing ?limitations in current medical AI models confined to specific tasks, ROLMM demonstrates proficiency in diverse tasks encompassing overall ?workflow of radiation oncology: clinical report summarization, radiotherapy strategy suggestion, and plan-guided 3D target volume segmentation. Another key contribution of this work is the introduction ?of consistency technique into both text and segmentation task. Results ?from multi-center cohort datasets confirm RO-LMM’s promising performance and noteworthy generalization capabilities across diverse tasks. ?These findings mark a significant stride towards developing a versatile ?AI model, hinting at the potential for a multi-purpose medical AI model ?in radiation oncology
在本研究中,我們介紹了RO-LMM——一種專為放射腫瘤學設計的多用途、綜合性基礎模型。為解決當前醫療人工智能模型局限于特定任務的問題,RO-LMM在放射腫瘤學的整體工作流程中展現出對多種任務的處理能力,包括臨床報告總結、放射治療策略建議以及基于計劃的三維靶區分割。本研究的另一關鍵貢獻是將一致性技術引入文本任務和分割任務中。來自多中心隊列數據集的結果證實,RO-LMM在各類任務中均表現出良好的性能和顯著的泛化能力。這些發現標志著在開發多功能人工智能模型方面邁出了重要一步,也暗示了多用途醫療人工智能模型在放射腫瘤學領域的應用潛力。
Results
結果
5.1. Clinical report summarization We present the performance of our model on the clinical report ?summarization task, along with confidence intervals for each method, ?in Table 2. Our fine-tuned model of RO-LMM-S demonstrate significant improvements over the Defaults, providing consistent margins in ?all metrics and confidence intervals. Notably, RO-LMM-S outperforms ?ChatGPT with few-shot in-context learning. Moreover, we evaluate the generated summaries using expertisebased rubrics by two clinical experts and compare them to Defaults, ?including ChatGPT and LLaMa-2. As shown in Table 3, our RO-LMM-S ?model significantly outperforms all Defaults in both internal and external validations, thanks to its domain-specific knowledge. Additionally, ?Pearson correlation (??) analysis reveals strong positive inter-clinician ?correlations (> 0.85 and > 0.95 for internal and external validation, ?respectively), confirming the reliability of our rubrics and the clinical ?relevance of RO-LMM-S. Therefore, our RO-LMM-S provides practical ?and meaningful summaries that can assist in the field of radiation ?oncology.
5.1. 臨床報告總結 ? 我們在表2中呈現了模型在臨床報告總結任務上的性能,以及每種方法的置信區間。經過微調的RO-LMM-S模型相較于傳統方法(Defaults)有顯著改進,在所有指標和置信區間上均保持穩定優勢。值得注意的是,RO-LMM-S的性能優于采用少樣本上下文學習的ChatGPT。 ? 此外,我們通過兩位臨床專家基于專業評分標準對生成的總結進行了評估,并與包括ChatGPT和LLaMa-2在內的傳統方法進行了對比。如表3所示,得益于其領域特定知識,我們的RO-LMM-S模型在內部和外部驗證中均顯著優于所有傳統方法。此外,皮爾遜相關分析(??)顯示,臨床專家之間存在強正相關(內部驗證>0.85,外部驗證>0.95),這證實了我們評分標準的可靠性以及RO-LMM-S的臨床相關性。因此,我們的RO-LMM-S能夠提供實用且有意義的總結,可為放射腫瘤學領域提供輔助。
Figure
圖
Fig. 1. RO-LMM as an assistant large multimodal model (LMM) in the field of radiation oncology. The model seamlessly covers various tasks such as clinical report summarization, ?radiation radiotherapy strategy suggestion, and 3D target volume segmentation.
圖1. RO-LMM作為放射腫瘤學領域的輔助大型多模態模型(LMM)。該模型無縫涵蓋多種任務,如臨床報告總結、放射治療策略建議以及三維靶區分割。
Fig. 2. Schematics of RO-LMM training for three different tasks. (a) RO-LMM-S for clinical note summarization. (b) RO-LMM-P++ for radiotherapy strategy suggestion. (c) ?RO-LMM-SEG++ for plan-guided target volume segmentation.
圖2. RO-LMM針對三項不同任務的訓練示意圖。(a) 用于臨床記錄總結的RO-LMM-S;(b) 用于放射治療策略建議的RO-LMM-P++;(c) 用于基于計劃的靶區分割的RO-LMM-SEG++。
Fig. 3. Schematics of RO-LMM-SEG++ for plan-guided 3D target volume segmentation task, which composed of (a) image module and (b) text module. These module outputs are ?aligned through (c) multimodal alignment module.
圖3. 用于基于計劃的三維靶區分割任務的RO-LMM-SEG++示意圖,該模型由(a)圖像模塊和(b)文本模塊組成。這些模塊的輸出通過(c)多模態對齊模塊進行對齊。
Fig. 4. Qualitative comparison on 3D target volume segmentation task. Red arrows indicate errors.
圖4. 三維靶區分割任務的定性對比。紅色箭頭指示錯誤之處。
Table
表
Table 1 Training data details. CRS: Clinical Report Summarization. RSS: Radiotherapy Strategy ?Suggestion. PTS: Plan-guided Target Segmentation. US: Ultrasound. Path: Pathology.
表1 訓練數據詳情 ? CRS:臨床報告總結 ? RSS:放射治療策略建議 ? PTS:基于計劃的靶區分割 ? US:超聲 ? Path:病理學
Table 2 Quantitative comparison for clinical note summarization. Vanilla: the instruction fine tuning. CI: confidence interval.
表2 臨床記錄總結的定量對比 ? Vanilla:指令微調 ? CI:置信區間
Table 3 Clinical expert analysis for report summarization. R#: each rubric, C#:each clinical ?expert.
表3 報告總結的臨床專家分析 ? R#:各項評分標準 ? C#:各位臨床專家
Table 4 Clinical expert analysis for radiotherapy strategy suggestion. R#: each rubric, C#: each clinical expert.
表4 放射治療策略建議的臨床專家分析 ? R#:各項評分標準 ? C#:各位臨床專家
Table 5 Comparison of 3D target volume segmentation performance
表5 三維靶區分割性能對比
Table 6 Comparison of 3D target segmentation performance for overall and specific patient types
表6 針對整體及特定患者類型的三維靶區分割性能對比
Table 7 Quantitative comparison results for our RO-LMM’s clinical report summarization and ?radiotherapy strategy suggestion performance on the publicly available dataset.
表7 我們的RO-LMM在公開數據集上的臨床報告總結和放射治療策略建議性能的定量對比結果。
Table 8 Ablation study on adopting separate expertise for each textual task against unified ?strategy.
表8 針對每項文本任務采用單獨專業知識與采用統一策略的消融研究對比
Table 9 Ablation study on CESEG for target segmentation performance with input text variation.
表9 針對輸入文本變化下靶區分割性能的CESEG消融研究
Table 10 Component analysis of our proposed method on radiotherapy strategy suggestion..
表10 我們提出的方法在放射治療策略建議方面的組件分析
Table 11 Inference computational complexity. External validation (N
表11 推理計算復雜度。外部驗證(N
Table A.1 The proposed expertise-based rubrics for assessing the performance of clinical report ?summarization.
表A.1 用于評估臨床報告總結性能的基于專業知識的評分標準
Table A.2 Score rubrics for radiotherapy strategy suggestion.
表A.2 放射治療策略建議的評分標準