通過具有一致性嵌入的大語言模型（LMMs）實現端到端乳腺癌放射治療計劃制定|文獻速遞-醫學影像算法文獻分享

Title

題目

End-to-end breast cancer radiotherapy planning via LMMs with consistencyembedding

通過具有一致性嵌入的大語言模型（LMMs）實現端到端乳腺癌放射治療計劃制定

文獻速遞介紹

近年來，受大型語言模型（LLM）啟發的新一代人工智能模型（即基礎模型）的出現，標志著其與以往范式存在顯著差異（Moor 等人，2023）。這些模型具有規模龐大、功能多樣的特點，這源于它們在多樣化數據上進行的自監督訓練。目前，這些基礎模型已能夠在多個領域實現最先進（SOTA）的性能，包括多模態推理、圖文生成、圖像 captioning 以及文本引導的圖像分割等任務（Bubeck 等人，2023；Dai 等人，2024；Driess 等人，2023；Li 等人，2023b；Liu 等人，2024；Lai 等人，2024）。這些特性意味著人工智能與醫療實踐的融合可能迎來范式轉變——醫療實踐本身就依賴多模態信息來制定全面的臨床決策。此外，這也為克服目前 500 多種經 FDA 批準的人工智能模型的局限性提供了契機，這些模型大多僅針對特定任務，且依賴單模態信息（Joshi 等人，2024）。具體而言，與這些單模態人工智能不同，結合基礎模型的通用醫療人工智能能夠全面理解臨床工作流程，可接收多種醫療數據，包括影像模態、電子健康記錄、實驗室結果、基因組學數據，甚至臨床報告（Singhal 等人，2023；Rajpurkar 和 Lungren，2023；Wu 等人，2023b；Moor 等人，2023；Tu 等人，2024）。通過理解各類數據及其相互關系，多模態人工智能能夠提供患者數據的全面視圖，從而助力更準確的診斷、個性化治療方案的制定，并減少醫療差錯。本文聚焦的放射腫瘤學領域，多模態整合至關重要，使其成為評估基礎模型潛力的最重要臨床領域之一。因此，我們在此介紹 RO-LMM——一種專為支持放射腫瘤學臨床工作流程設計的原型大型多模態模型（LMM）。具體而言，本研究顯著擴展了我們先前的相關工作 LLMSeg（Oh 等人，2024），后者側重于多模態分割。更具體地說，RO-LMM 通過處理放射腫瘤學中更廣泛的臨床任務，擴大了 LLMSeg 的應用范圍：（1）它能將大量患者病史和檢查結果高效總結為簡潔且信息豐富的臨床筆記；（2）能從臨床專家視角提出合適的放射治療策略；（3）在三維（3D）計算機斷層掃描（CT）圖像上勾畫與所提放射治療策略一致的放療靶區。RO-LMM 的這種多方面功能，在支持臨床專業人員的專業工作方面展現出顯著進步。在訓練 LLM 執行從放療策略建議到靶區分割的一系列連續任務時，我們發現每個任務都存在誤差累積的可能性，這可能導致端到端性能的顯著下降。因此，本研究的另一重要貢獻是采用并擴展了噪聲嵌入微調（NEFTune）技術（Jain 等人，2024），該技術在針對每個目標任務的訓練過程中，會向嵌入中注入均勻噪聲。更具體地說，為進一步增強模型的適用性，我們開發了一種新穎的一致性嵌入微調（CEFTune）技術，通過添加正則化損失來強制模型在噪聲輸入和干凈輸入下的預測保持一致。此外，通過擴展到文本相關任務之外，我們將這些概念應用于 3D 分割任務，提出了新穎的噪聲嵌入分割（NESEG）和一致性嵌入分割（CESEG）技術。這些進展防止了后續任務之間的誤差傳播，共同顯著提升了端到端模型在內部和外部驗證中的泛化能力。作為概念驗證研究，我們的 RO-LMM 框架被應用于乳腺癌研究——乳腺癌是一種高發癌癥，其放射治療相對標準化，且僅需基于 CT 影像。我們的貢獻可總結如下： - 提出了一個全面的框架 RO-LMM，其中 LMM 為乳腺癌放射治療的廣泛工作流程提供支持。據我們所知，該原型是首個支持放射腫瘤學全面工作流程的模型。 - 為防止在臨床背景總結、放療策略建議和基于計劃的靶區分割等連續臨床任務中可能出現的誤差累積，我們探索了噪聲增強和一致性方法，并提出了新穎的訓練方法（如 CEFTune、NESEG 和 CESEG），顯著增強了我們方法的穩健性。 - 通過在乳腺癌患者的真實臨床數據上進行多種驗證設置的實驗，我們證明了 RO-LMM 的性能優于傳統方法。

Abatract

摘要

Recent advances in AI foundation models have significant potential for lightening the clinical workload bymimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field ofradiation oncology, the integration of multiple modalities holds great importance, so the opportunity offoundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensivelarge multimodal model (LMM) tailored for the field of radiation oncology. This model effectively managesa series of tasks within the clinical workflow, including clinical context summarization, radiotherapy strategysuggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, toperform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding FineTuning (CEFTune) technique, which boosts LMM’s robustness to noisy inputs while preserving the consistencyof handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to anovel Consistency Embedding Segmentation (CESEG) techniques. Experimental results including multi-centervalidation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multipleclinical tasks with generalization capabilities.

人工智能基礎模型的最新進展具有巨大潛力，可通過模仿醫療專業人員采用的全面、多層面方法來減輕臨床工作負擔。在放射腫瘤學領域，多模態整合至關重要，因此基礎模型的應用前景十分廣闊。受此啟發，我們提出了RO-LMM——一種專為放射腫瘤學領域設計的多功能、綜合性大型多模態模型（LMM）。該模型借助LMM的能力，有效處理臨床工作流程中的一系列任務，包括臨床背景總結、放射治療策略建議以及基于計劃的靶區分割。特別地，為了在執行連續臨床任務時避免誤差累積，我們提出了一種新穎的一致性嵌入微調（CEFTune）技術，該技術在增強LMM對噪聲輸入的穩健性的同時，保持了處理干凈輸入時的一致性。我們進一步將這一概念擴展到LMM驅動的分割框架中，形成了一種新穎的一致性嵌入分割（CESEG）技術。包括多中心驗證在內的實驗結果證實，結合了CEFTune和CESEG的RO-LMM在多項臨床任務中表現出良好性能，并具備泛化能力。

Method

方法

In this section, we provide a detailed description of our proposedapproach designed for sequential text generation tasks, including summarization and suggestions, as well as text-driven image segmentation,whose robustness is improved by consistency embedding finetuning.The overall framework is illustrated in Fig. 2.

在本節中，我們將詳細描述所提出的方法，該方法適用于連續的文本生成任務（包括總結和建議）以及文本驅動的圖像分割任務，通過一致性嵌入微調增強了這些任務的穩健性。整體框架如圖2所示。

Conclusion

結論

In this work, we introduce RO-LMM, a multi-purpose, comprehensive foundation model tailored for radiation oncology. Addressinglimitations in current medical AI models confined to specific tasks, ROLMM demonstrates proficiency in diverse tasks encompassing overallworkflow of radiation oncology: clinical report summarization, radiotherapy strategy suggestion, and plan-guided 3D target volume segmentation. Another key contribution of this work is the introductionof consistency technique into both text and segmentation task. Resultsfrom multi-center cohort datasets confirm RO-LMM’s promising performance and noteworthy generalization capabilities across diverse tasks.These findings mark a significant stride towards developing a versatileAI model, hinting at the potential for a multi-purpose medical AI modelin radiation oncology

在本研究中，我們介紹了RO-LMM——一種專為放射腫瘤學設計的多功能、綜合性基礎模型。為解決當前醫療人工智能模型局限于特定任務的問題，RO-LMM在放射腫瘤學的整體工作流程中展現出處理多種任務的能力，包括臨床報告總結、放射治療策略建議以及基于計劃的三維靶區勾畫。本研究的另一核心貢獻是將一致性技術引入文本任務和分割任務中。來自多中心隊列數據集的結果證實，RO-LMM在各類任務中均表現出良好性能，并具備顯著的泛化能力。這些發現標志著在開發多功能人工智能模型方面邁出了重要一步，也為放射腫瘤學領域多功能醫療人工智能模型的發展潛力提供了啟示。

Results

結果

5.1. Clinical report summarization

We present the performance of our model on the clinical reportsummarization task, along with confidence intervals for each method,in Table 2. Our fine-tuned model of RO-LMM-S demonstrate significant improvements over the Defaults, providing consistent margins inall metrics and confidence intervals. Notably, RO-LMM-S outperformsChatGPT with few-shot in-context learning.Moreover, we evaluate the generated summaries using expertisebased rubrics by two clinical experts and compare them to Defaults,including ChatGPT and LLaMa-2. As shown in Table 3, our RO-LMM-Smodel significantly outperforms all Defaults in both internal and external validations, thanks to its domain-specific knowledge. Additionally,Pearson correlation (𝑟) analysis reveals strong positive inter-cliniciancorrelations (> 0.85 and > 0.95 for internal and external validation,respectively), confirming the reliability of our rubrics and the clinicalrelevance of RO-LMM-S. Therefore, our RO-LMM-S provides practicaland meaningful summaries that can assist in the field of radiationoncology

?5.1 臨床報告總結我們在表2中呈現了模型在臨床報告總結任務上的性能，以及每種方法的置信區間。我們經過微調的RO-LMM-S模型相較于基準模型展現出顯著提升，在所有指標和置信區間中均保持穩定優勢。值得注意的是，RO-LMM-S的性能優于采用少樣本上下文學習的ChatGPT。此外，我們通過兩位臨床專家基于專業評分標準對生成的總結內容進行評估，并與包括ChatGPT和LLaMa-2在內的基準模型進行對比。如表3所示，得益于其領域特定知識，我們的RO-LMM-S模型在內部和外部驗證中均顯著優于所有基準模型。此外，皮爾遜相關系數（𝑟）分析顯示，臨床專家之間存在強正相關（內部驗證>0.85，外部驗證>0.95），這證實了我們評分標準的可靠性以及RO-LMM-S的臨床相關性。因此，我們的RO-LMM-S能夠提供實用且有意義的總結內容，可為放射腫瘤學領域提供輔助支持。

Figure

圖

Fig. 1. RO-LMM as an assistant large multimodal model (LMM) in the field of radiation oncology. The model seamlessly covers various tasks such as clinical report summarization,radiation radiotherapy strategy suggestion, and 3D target volume segmentation.

圖1. RO-LMM作為放射腫瘤學領域的輔助大型多模態模型（LMM） ? 該模型可無縫處理多項任務，包括臨床報告總結、放射治療策略建議以及三維靶區勾畫。

Fig. 2. Schematics of RO-LMM training for three different tasks. (a) RO-LMM-S for clinical note summarization. (b) RO-LMM-P++ for radiotherapy strategy suggestion. (c)RO-LMM-SEG++ for plan-guided target volume segmentation.

圖2. RO-LMM針對三項不同任務的訓練示意圖 ? （a）用于臨床筆記總結的RO-LMM-S； ? （b）用于放射治療策略建議的RO-LMM-P++； ? （c）用于基于計劃的靶區勾畫的RO-LMM-SEG++。

Fig. 3. Schematics of RO-LMM-SEG++ for plan-guided 3D target volume segmentation task, which composed of (a) image module and (b) text module. These module outputs arealigned through (c) multimodal alignment module

圖3. 用于基于計劃的三維靶區勾畫任務的RO-LMM-SEG++示意圖 ? 該模型由（a）圖像模塊和（b）文本模塊組成，兩個模塊的輸出通過（c）多模態對齊模塊實現對齊。

Fig. 4. Qualitative comparison on 3D target volume segmentation task. Red arrows indicate errors.

圖4. 三維靶區勾畫任務的定性對比 ? 紅色箭頭指示錯誤區域。

Table

表

Table 1Training data details. CRS: Clinical Report Summarization. RSS: Radiotherapy StrategySuggestion. PTS: Plan-guided Target Segmentation. US: Ultrasound. Path: Pathology

表1 訓練數據詳情 ? CRS：臨床報告總結 ? RSS：放射治療策略建議 ? PTS：基于計劃的靶區勾畫 ? US：超聲 ? Path：病理學

Table 2Quantitative comparison for clinical note summarization. Vanilla: the instruction fine tuning. CI: confidence interval

表2 臨床筆記總結的定量對比 ? Vanilla：指令微調 ? CI：置信區間

Table 3Clinical expert analysis for report summarization. R#: each rubric, C#:each clinicalexper.

表3 報告總結的臨床專家分析 ? R#：各項評分標準 ? C#：各位臨床專家

Table 4Clinical expert analysis for radiotherapy strategy suggestion. R#: each rubric, C#: each clinical expert.

表4 放射治療策略建議的臨床專家分析 ? R#：各項評分標準 ? C#：各位臨床專家

Table 5Comparison of 3D target volume segmentation performance

表5. 三維靶區勾畫性能對比

Table 6Comparison of 3D target segmentation performance for overall and specific patient types

表6 整體及特定患者類型的三維靶區勾畫性能對比

Table 7Quantitative comparison results for our RO-LMM’s clinical report summarization andradiotherapy strategy suggestion performance on the publicly available dataset.

表7 我們的RO-LMM在公開數據集上的臨床報告總結和放射治療策略建議性能的定量對比結果