Title
題目
Deep learning detection of acute and sub-acute lesion activity from single-timepoint conventional brain MRI in multiple sclerosis
基于單次常規腦MRI的深度學習檢測多發性硬化癥急性和亞急性病變活動性
01
文獻速遞介紹
多發性硬化癥(MS)是一種中樞神經系統的慢性炎癥性和神經退行性疾病。其病理特征為脫髓鞘病變的形成,在MRI上表現為相對于周圍正常白質(NAWM)的T2加權(T2w)高信號(Traboulsee和Li,2006)。在隨訪稀疏的縱向研究中,通過比較連續T2w掃描可檢測到新病變的出現,這是急性炎癥性疾病活動的實用標志物(Altay等人,2013)。因此,新病變活動的測量通常被用作臨床試驗的終點(Calabresi等人,2014),并在臨床環境中指導治療決策和預后(Wattjes等人,2021)。 ? 新病變形成的急性期可分為兩個階段(Rovira等人,2013):急性期和亞急性期。初始急性期的特征是血腦屏障短暫破壞,在靜脈注射釓基造影劑后的T1w掃描中表現為局灶性對比增強區域(Kappos等人,1999),平均持續3-6周(Cotton等人,2003)。隨后的亞急性期表現為未增強T1w和T2w圖像上病變大小和信號強度的顯著變化,通常持續3-6個月(Rovira等人,2013)(可能反映炎癥水腫的吸收和包括修復在內的非炎癥過程的復雜平衡)。亞急性期最終會進入慢性期(Rovira等人,2013)。 ? 重要的是,病變演變是一個連續且異質的過程。盡管血腦屏障破壞后短時間內病變大小或信號強度的快速變化可能反映亞急性炎癥活動及急性病理的逐漸消退,但這種變化也可能發生在所謂的慢性病變期,包括緩慢擴展的慢性活動性表型。本研究中常用的6個月閾值提供了亞急性與慢性狀態的實用定義,應將其視為對未知生物學真相的近似。 ? 雖然急性期病變可通過單次掃描(對比增強T1w掃描)識別,但亞急性期病變檢測目前需要比較兩個時間點(間隔不超過6個月),這帶來了特殊挑戰。在MS患者的臨床管理中,對先前近期參考掃描的需求可能會延誤治療。在臨床試驗中,患者篩查通常在單一時間點進行,無需先前參考,因此急性病變檢測僅限于釓增強(GdE)病變。這降低了活動性MS患者的試驗 eligibility(入選資格)。[1] ? 為應對這些挑戰,我們提出了一系列深度學習(DL)方法,以從單次MRI量化腦水平的急性炎癥性病變活動。我們的貢獻如下: ? 1. 定義新任務:從單次MRI量化腦水平的急性和亞急性MS病變活動。該任務超越了傳統的GdE病變檢測,旨在檢測所有小于24周的病變,無論是否存在釓增強。我們在預測未來6個月急性病變活動的背景下,證明了新任務的臨床實用性。 ? 2. 多序列MRI建模:使用多個常規序列(對比前后T1加權、T2加權和質子密度加權)的MRI數據開發和評估DL模型。通過比較不同模型,我們為新任務建立了強有力的基準,并提出整合全腦特征的潛在優勢,為未來研究指明方向。 ? 3. 臨床應用驗證:我們的最佳性能模型(2D-UNet)通過單次掃描估算急性病變活動,當與傳統GdE病變活動測量結合使用時,可顯著改善對未來急性病變活動的預后預測,證明了其臨床實用性。 ? 本文其余部分結構如下:第2節總結先前工作,第3節介紹材料和方法(即數據/模型),第4節展示模型基準測試結果,第5節討論每個模型的優勢和局限性。
Abatract
摘要
Multiple sclerosis (MS) is a chronic inflammatory disease characterized by demyelinating lesions in the centralnervous system. Cross-sectional measurements of acute inflammatory lesion activity are typically obtained bydetecting the presence of gadolinium enhancement in lesions, which typically lasts 3-6 weeks. We formulatethe novel and clinically relevant task of quantification of recent acute lesion activity from the past 24 weeks(6 months) using single-timepoint conventional brain magnetic resonance imaging (MRI). We develop andcompare several deep learning (DL) methods for estimating this brain-level acuteness score and show that a2D-UNet can accurately predict acute disease activity at the patient-level while outperforming transformersand ensemble approaches. In the context of identifying subjects with acute (less than 6 months-old) lesionactivity, our 2D-UNet achieves an area under the receiver-operating curve in the range 80?84% on independentrelapsing-remitting MS cohorts. When used in conjunction with measurements of gadolinium-enhancing lesionactivity, our model significantly improves the prognostication of future acute lesion activity (over the next6 months). This model could thus be leveraged for population recruitment in clinical trials to identify ahigher number of patients with acute inflammatory activity than current standard approaches (e.g., gadoliniumpositivity) with a predictable precision/recall trade-off.
多發性硬化癥(MS)是一種慢性炎癥性疾病,其特征為中樞神經系統中的脫髓鞘病變。急性炎癥性病變活動的橫斷面測量通常通過檢測病變中釓增強的存在來實現,這種增強通常持續3-6周。我們提出了一項新穎且具有臨床相關性的任務:利用單次常規腦磁共振成像(MRI)量化過去24周(6個月)內的近期急性病變活動。我們開發并比較了幾種用于估算腦水平急性評分的深度學習(DL)方法,結果表明,2D-UNet能夠在患者水平上準確預測急性疾病活動,其性能優于Transformer和集成方法。在識別具有急性(小于6個月)病變活動的受試者方面,我們的2D-UNet在獨立的復發緩解型MS隊列中實現了80-84%的受試者工作特征曲線下面積。當與釓增強病變活動的測量結合使用時,我們的模型顯著改善了對未來急性病變活動(未來6個月內)的預后預測。因此,該模型可用于臨床試驗中的人群招募,以比當前標準方法(如釓陽性)以可預測的精確率/召回率權衡識別出更多具有急性炎癥活動的患者。
Method
方法
3.1. Population overview
Brain MRI scans from 3 phase-III pivotal trials were retrospectively analyzed: ADVANCE (Calabresi et al., 2014) (1512 subjects withrelapsing-remitting MS; NCT00906399), ASCEND (Polman et al., 2006)(886 subjects with secondary progressive MS; NCT01416181) and DECIDE (Kappos et al., 2015) (1841 subjects with relapsing-remittingMS; NCT01064401). Inclusion criteria and MRI acquisition protocolshave been previously described; Table 1 summarizes relevant statistics.Briefly, patients participated in a baseline session and a series of followups, during which the following MRI modalities were acquired: T1w(pre- and post-gadolinium injection), T2w and proton density-weighted(PDw). Follow-up MRI scans were acquired 24, 48 and 96 weeks postbaseline in ADVANCE, after 24, 48, 72, 96, 108 and 156 weeks inASCEND, and after 24 and 96 weeks in DECIDE. Patient inclusion inthis study was constrained by MRI data availability.
3.1 人群概述 ? 本研究回顧性分析了三項III期關鍵試驗的腦MRI掃描數據:ADVANCE(Calabresi等,2014)(1512例復發緩解型MS患者;NCT00906399)、ASCEND(Polman等,2006)(886例繼發進展型MS患者;NCT01416181)和DECIDE(Kappos等,2015)(1841例復發緩解型MS患者;NCT01064401)。納入標準和MRI采集協議已在既往研究中描述;表1總結了相關統計數據。簡而言之,患者在基線期及一系列隨訪期間完成MRI掃描,采集序列包括:T1加權(釓注射前后)、T2加權和質子密度加權(PDw)。ADVANCE隨訪MRI在基線后24、48和96周采集,ASCEND在基線后24、48、72、96、108和156周采集,DECIDE在基線后24和96周采集。本研究的患者納入受限于MRI數據的可獲得性。
Conclusion
結論
In this paper, we have focused on the estimation of acute MSlesion activity from single-timepoint conventional MRI. While this taskhas traditionally been tackled via the detection of GdE lesions, wehave proposed the new task of recent (less than 24-weeks old) T2lesion detection. We have demonstrated the clinical relevance of thisnewly proposed task by showing that quantification of recent MSlesion activity that is no longer GdE improves the prediction of futureinflammatory lesion activity, when used in conjunction with traditionalmeasures of GdE lesion activity. Moreover, we have developed andvalidated several models to establish a benchmark of achievable performances on our newly proposed task. In particular, our UNet-2D canidentify gadolinium-negative subjects with recent (less than 24-weeks)acute inflammatory lesion activity with high accuracy (ROC AUC inthe range 80–84% on independent RRMS datasets) and improves theprognostication of future inflammatory lesion activity. This could beexploited as a population enrichment strategy for identifying patientsmost or least likely to show disease activity during the course of aclinical trial.In the wake of recent major advances of the MS therapy armamentarium where 3 FDA-approved anti-CD20 treatments are associatedwith a complete silencing of acute inflammation, the next generation ofdrug development need is turning towards CNS pathways susceptible tocausally drive disability progression independent of acute inflammationalso called smoldering-associated worsening. As such, methods able toidentify subjects at risk of future acute inflammatory disease activity(such as models proposed in this paper) are likely to be valuable indesigning inclusion/exclusion criteria and/or enrichment strategies forfuture trials (e.g., via exclusion of subjects at risk of acute inflammation). In this respect, the output of our models could be used as a drugdevelopment tool as defined by the U. S. Food and Drug Administration(2024), i.e. methods, materials, or measures that have the potentialto facilitate drug development, such as a biomarker used for betterpatient stratification at baseline or trial enrichment. Such models mayprove to be a necessary step to augment and de-risk the probability oftechnical and regulatory success for novel drugs to be evaluated as totheir capacity to stop the ‘true MS progression’ related to intra-CNScompartmentalized pathobiology, unconfounded by a potential drugeffect on peripheral acute inflammation.
在本文中,我們聚焦于通過單次常規MRI估算多發性硬化癥(MS)的急性病變活動。盡管傳統上該任務通過檢測釓增強(GdE)病變來解決,但我們提出了新的任務:檢測近期(小于24周)的T2病變。我們已證明這一新任務的臨床相關性——當與傳統的GdE病變活動測量結合使用時,對不再具有GdE的近期MS病變活動進行量化,可改善對未來炎癥性病變活動的預測。此外,我們開發并驗證了多個模型,為這一新任務確立了可實現的性能基準。特別是,我們的2D-UNet能夠高精度識別具有近期(小于24周)急性炎癥性病變活動的釓陰性受試者(在獨立的復發緩解型MS數據集上,ROC曲線下面積[AUC]為80-84%),并改善對未來炎癥性病變活動的預后預測。這可作為一種人群富集策略,用于在臨床試驗過程中識別最可能或最不可能出現疾病活動的患者。 ? 鑒于MS治療領域的最新重大進展——3種FDA批準的抗CD20療法可完全抑制急性炎癥,下一代藥物開發需求正轉向中樞神經系統(CNS)通路,這些通路可能在不依賴急性炎癥的情況下驅動殘疾進展(也稱為“緩慢進展相關惡化”)。因此,能夠識別未來有急性炎癥性疾病活動風險的受試者的方法(如本文提出的模型),可能在設計未來試驗的納入/排除標準和/或富集策略中具有重要價值(例如,通過排除有急性炎癥風險的受試者)。在這方面,我們模型的輸出可作為美國食品藥品監督管理局(2024年)定義的藥物開發工具,即有可能促進藥物開發的方法、材料或措施,例如用于在基線時更好地對患者進行分層或試驗富集的生物標志物。這類模型可能是一個必要步驟,以增強和降低新型藥物的技術和監管成功概率,從而評估其阻止與CNS內區室化病理生物學相關的“真正MS進展”的能力,避免被藥物對外周急性炎癥的潛在影響所干擾。
Results
結果
4.1. Performance evaluation
We report subject-level classification results stratified by trial usingpredictions from the testing set, on the task of identifying subjects withacute MS lesions, defined as new T2 lesions that are less than 24-weeksold. Since GdE lesions are clearly visible on post-contrast T1w MRI,all models achieved near-perfect classification on GdE-positive subjects,with sensitivity exceeding 95%. We thus focus on subjects that did notshow GdE lesions such as to investigate the ability of our proposedmodels to specifically detect sub-acute MS lesion activity. Classificationmetrics are presented in Fig. 6 and Receiver Operating Characteristic(ROC) curves are shown in Fig. 7.
4.1 性能評估 ? 我們使用測試集的預測結果,按試驗分層報告受試者水平的分類結果,任務為識別具有急性MS病變(定義為小于24周的新發T2病變)的受試者。由于釓增強(GdE)病變在對比劑注射后的T1w MRI上清晰可見,所有模型在GdE陽性受試者中均實現了近乎完美的分類,敏感性超過95%。因此,我們聚焦于無GdE病變的受試者,以探究所提出模型特異性檢測亞急性MS病變活動的能力。分類指標如圖6所示,受試者工作特征(ROC)曲線如圖7所示。
Figure
圖
Fig. 1. Illustration of contrast-enhanced T1w and T2w brain MRI scans acquired atbaseline, week 4 and week 24 from a subject from ADVANCE. The cross-hair shows aT2 lesion that is new at week 24 relative to baseline and which only shows gadoliniumenhancement at week 4. This lesion is considered sub-acute at week 24 (less than 24-weeks old yet no longer GdE).
圖1. 來自ADVANCE隊列某受試者在基線、第4周和第24周采集的對比增強T1w和T2w腦MRI掃描示意圖。十字標記顯示了一個在第24周相對于基線的新發T2病變,該病變僅在第4周出現釓增強。該病變在第24周被視為亞急性(年齡小于24周但不再有釓增強)。
Fig. 2. Example of brain MRI scans and lesion segmentation masks for a representative?participant of the ADVANCE cohort
圖2. ADVANCE隊列中一名代表性參與者的腦MRI掃描及病變分割掩碼示例
Fig. 3. Task overview: predict whole-brain acuteness score from single-timepoint MRI.
圖3. 任務概述:基于單次MRI預測全腦急性評分
Fig. 4. Network architecture of the full-brain transformer. It is composed of one ResNetdown-sampling block followed by 10 ResNet blocks. A class attention layer aggregatesinformation across patches and a fully-connected layer generates the final score
圖4. 全腦Transformer的網絡架構。該架構由一個ResNet下采樣模塊和10個ResNet模塊組成,通過類別注意力層聚合各圖像塊的信息,最終由全連接層生成評分。
Fig. 5. Architecture of the lesion-based transformer. Patches are extracted around each lesion bounding box and resized to a target shape. A ResNet-like network produces patchembeddings. Dependencies between patches are modeled through a transformer block. A classifier head produces scores at the patch-level, while a class attention transformer blockcombines the patch embeddings to generate a prediction at the brain-level. The figure is represented in 2D here for clarity, but the volumes are processed in 3D in practice.
圖5. 基于病變的Transformer架構示意圖。圍繞每個病變邊界框提取圖像塊并調整為目標尺寸,通過類ResNet網絡生成塊嵌入。塊間依賴關系由Transformer模塊建模,分類頭生成塊級評分,同時通過類別注意力Transformer模塊融合塊嵌入以生成腦級預測。為清晰起見圖中以2D形式展示,實際處理時采用3D體積數據。
Fig. 6. Classification metrics for single-timepoint identification of patients with acutelesions (<24 weeks-old) for the subset of patients without GdE lesions. The ensemblecombines the naive classifier, the UNet-2D, the full-brain transformer and the lesionbased transformer. Error bars show the 95% confidence interval across the 10 folds.See 11 for results of the lesion-based radiomics classifier
圖6. 針對無釓增強(GdE)病變患者亞組,基于單次掃描識別急性病變(年齡<24周)的分類指標。集成模型結合了樸素分類器、2D-UNet、全腦Transformer和基于病變的Transformer。誤差棒表示10折交叉驗證的95%置信區間。基于病變的放射組學分類器結果見補充材料11。
Fig. 7. ROC curves for different models, computed among gadolinium-negative subjectsfrom the test set and grouped by trial. The area under the ROC curve (AUC) of thebest model (UNet-2D) is 84% in ADVANCE, 73% in the placebo arm of ASCEND, and80% in DECIDE
圖7. 不同模型在測試集釓陰性受試者中的ROC曲線(按試驗分組)。最佳模型(2D-UNet)的ROC曲線下面積(AUC)在ADVANCE中為84%,ASCEND安慰劑組中為73%,DECIDE中為80%。
Fig. 8. Balanced accuracy (and 95% CI) among gadolinium-negative subjects fromDECIDE using various combinations of input sequences. Results are given for the first ofthe 10 folds. Abbreviations: WMH: white matter hyperintensity map (i.e., T2 lesions);T1w: T1-weighted MRI; T2w: T2-weighted MRI; T1c: T1-weighted MRI post-gadolinium;PDw: proton density-weighted MRI
圖8. 使用不同輸入序列組合時,DECIDE隊列中釓陰性受試者的平衡準確率(及95%置信區間)。結果為10折交叉驗證中第一折的結果。縮寫: WMH:白質高信號圖(即T2病變);T1w:T1加權MRI;T2w:T2加權MRI;T1c:釓增強后T1加權MRI;PDw:質子密度加權MRI。
Fig. 9. Visual interpretability maps, showing the segmentation map from the UNet-2D as well as the class attention maps and local predicted acuteness scores from the full-brainand lesion-based transformer models. Case 6 illustrates the robustness of the lesion-based transformer against local false positives: while several lesions were incorrectly predictedas acute, attention mechanisms were able to ignore incorrect local predictions and correctly classify the brain as non-acute
圖9. 可視化解釋圖,展示了2D-UNet的分割圖,以及全腦Transformer和基于病變的Transformer模型的類別注意力圖和局部預測急性評分。案例6顯示了基于病變的Transformer對局部假陽性的魯棒性:盡管多個病變被錯誤預測為急性,但注意力機制能夠忽略錯誤的局部預測,并正確將全腦分類為非急性。
Table
表
Table 1Demographics and lesion statistics of the participants from the ADVANCE, ASCEND and DECIDE cohorts analyzed in this paper. For the age and the brain volumes, we report themean standard deviation computed across subjects and visits, respectively
表1 本研究分析的ADVANCE、ASCEND和DECIDE隊列參與者的人口統計學和病變統計數據。對于年齡和腦容量,分別報告了跨受試者和隨訪時間點計算的均值±標準差。
Table 2Classification results for the 6 cases shown in Fig. 9. TP, TN, FP and FN stand for truepositive, true negative, false positive and false negative, respectively.
表2 圖9所示6個案例的分類結果。其中TP、TN、FP和FN分別代表真陽性、真陰性、假陽性和假陰性。
Table 3Regression results on the full model of predicting the NET2 lesion count at week 48relative to week 24 with both GdE and non-enhancing portion of the predicted NET2lesion volume as a predictor
表3 以預測的NET2病變體積中釓增強(GdE)和非增強部分為預測因子,對第48周相對于第24周NET2病變計數的完整模型回歸結果