內窺鏡檢查中基于提示的息肉分割|文獻速遞-深度學習醫療AI最新文獻

Title

題目

Prompt-based polyp segmentation during endoscopy

內窺鏡檢查中基于提示的息肉分割

文獻速遞介紹

以下是對這段英文內容的中文翻譯： ?### 胃腸道癌癥的發病率呈上升趨勢，且有年輕化傾向（Bray等人，2018）。因此，消化道早期癌癥的篩查至關重要。 colorectal cancer（CRC）患者在疾病一期的生存率超過95%，但到四、五期時則大幅降至35%以下（Bernal等人，2012）。目前，內窺鏡檢查在臨床實踐中被廣泛應用，已成為篩查消化道疾病的標準方法（Alfarone等人，2022）。 ? 部分息肉容易被內鏡醫師忽視，這對患者會造成嚴重后果。近期一項meta分析（Zhao等人，2019）顯示，結腸鏡檢查中26%的腺瘤會被漏診。盡管部分腺瘤因黏膜表面暴露不足而被漏檢（East等人，2007），但一項基于圖像的回顧性研究（Yamada等人，2019）表明，即使腺瘤已被可視化，仍有14%未被醫師識別。結腸鏡檢查中息肉的漏診率通常為17%-28%（Yamada等人，2019）。未治療的息肉易進展為胃腸道腫瘤和癌癥。因此，診斷過程中準確且實時的息肉分割至關重要。由于息肉的大小、外觀和位置具有異質性，準確診斷依賴于內鏡醫師的經驗，具有挑戰性。Koh等人（2023）的研究表明，人工智能（AI）技術可提高內窺鏡診斷和治療中的息肉檢測率。PraNet、FCBFormer、HarDNet等方法（Fan等人，2020；Sanderson和Matuszewski，2022；Huang等人，2021）在息肉分割中已取得顯著成果。然而，僅依賴基于AI的分割算法，仍容易漏診某些潰瘍、糜爛和早期惡性腫瘤。分割一切模型（SAM）（Kirillov等人，2023）展示了通過用戶提供的提示（如點或邊界框）對自然景觀圖像中的物體進行分割的卓越能力。因此，結合AI與內鏡醫師提示的息肉分割方法是一個重要的研究方向。 ? 在本研究中，我們提出了一種新型的基于提示的息肉分割方法（PPSM），可精確分割息肉并輔助內窺鏡下的早期癌癥診斷。內鏡醫師在結腸鏡檢查過程中會自然地將注意力集中在可疑病變區域（Wallace和Kiesslich，2010）。因此，PPSM將內鏡醫師的眼動注意力、非均勻點陣和息肉特征作為提示，以模擬內鏡醫師的息肉診斷過程。首先，我們提出了一種基于提示的息肉分割網絡（PPSN），該網絡由提示編碼模塊（PEM）、特征提取編碼模塊（FEEM）和掩碼解碼模塊（MDM）組成。PEM對提示進行編碼，以指導FEEM進行特征提取，并指導MDM生成掩碼，從而使PPSN能夠高效地分割息肉。其次，將內鏡醫師的注意力數據用作提示，這不僅能在現實場景中有效獲取提示數據，還能提高PPSN分割息肉的準確性。為增強PPSN的穩定性，我們生成了非均勻點陣提示，以補償眼動追蹤過程中的幀丟失。此外，我們引入了一種基于SAM的數據增強方法，以豐富提示數據集并提高PPSN的適應性。在Kvasir-SEG（Jha等人，2020）、CVC-ClinicDB（Bernal等人，2015）、SUN-SEG（Misawa等人，2021）和PolypGen（Ali等人，2023）數據集上，PPSM分別取得了0.952、0.991、0.993和0.987的高精度分數。這一性能表明PPSM在準確分割息肉方面的有效性。此外，PPSM的最大幀率可達233。在四個數據集上的交叉訓練和交叉測試結果顯示，PPSM具有出色的泛化能力。部分代碼和生成提示數據集的方法可在https://github.com/XinZhenRen/PPSM獲取。總之，本研究的貢獻包括： ? 1. 提出了一種準確、實時且泛化能力強的基于提示的息肉分割網絡。PPSN生成的掩碼邊界清晰、內部無空洞，且受內窺鏡工作環境的影響較小。五種提示顯著提升了PPSN分割息肉的性能，并簡化了訓練過程，減少了對大量眼動數據的需求。 ? 2. 為提高準確性，將內鏡醫師的經驗融入PPSN，使用內鏡醫師的眼動注意力數據作為提示來指導PPSN進行息肉分割。與鼠標點擊等提示輸入方法相比，我們的方法在內窺鏡檢查中更具實時性和實用性。此外，我們的方法還考慮了眼動注意力數據缺失的情況。據我們所知，PPSM是首個結合AI和內鏡醫師提示的息肉分割方法。 ? 3. 引入了一種基于SAM的數據增強方法，以豐富提示數據集并提高PPSN的適應性。SAM生成的提示包括正樣本和負樣本，幫助PPSN排除假息肉，并消除內窺鏡工作環境中的干擾。 ? 4. 開發了具有早期癌癥實時輔助診斷功能的一次性電子內窺鏡和圖像處理器。 ? 本文的其余部分結構如下：第2節討論相關工作；第3.1節介紹PPSM的總體架構；第3.2節詳細介紹PPSN；第3.3節討論提出的基于SAM的數據增強方法；第3.4節和第3.5節介紹實時提示策略；第4節展示實驗結果。

Abatract

摘要

Accurate judgment and identification of polyp size is crucial in endoscopic diagnosis. However, the indistinct ?boundaries of polyps lead to missegmentation and missed cancer diagnoses. In this paper, a prompt-based ?polyp segmentation method (PPSM) is proposed to assist in early-stage cancer diagnosis during endoscopy. ?It combines endoscopists’ experience and artificial intelligence technology. Firstly, a prompt-based polyp ?segmentation network (PPSN) is presented, which contains the prompt encoding module (PEM), the feature ?extraction encoding module (FEEM), and the mask decoding module (MDM). The PEM encodes prompts to ?guide the FEEM for feature extracting and the MDM for mask generating. So that PPSN can segment polyps ?efficiently. Secondly, endoscopists’ ocular attention data (gazes) are used as prompts, which can enhance ?PPSN’s accuracy for segmenting polyps and obtain prompt data effectively in real-world. To reinforce the ?PPSN’s stability, non-uniform dot matrix prompts are generated to compensate for frame loss during the eyetracking. Moreover, a data augmentation method based on the segment anything model (SAM) is introduced to ?enrich the prompt dataset and improve the PPSN’s adaptability. Experiments demonstrate the PPSM’s accuracy ?and real-time capability. The results from cross-training and cross-testing on four datasets show the PPSM’s ?generalization. Based on the research results, a disposable electronic endoscope with the real-time auxiliary ?diagnosis function for early cancer and an image processor have been developed.

在 endoscopic 診斷中，息肉大小的準確判斷和識別至關重要。然而，息肉邊界模糊會導致分割錯誤和癌癥漏診。本文提出一種基于提示的息肉分割方法（PPSM），用于輔助 endoscopic 早期癌癥診斷。該方法結合了內鏡醫師的經驗和人工智能技術。首先，提出了一種基于提示的息肉分割網絡（PPSN），該網絡包含提示編碼模塊（PEM）、特征提取編碼模塊（FEEM）和掩碼解碼模塊（MDM）。PEM 對提示進行編碼，以指導 FEEM 進行特征提取，并指導 MDM 生成掩碼，從而使 PPSN 能夠高效地分割息肉。其次，將內鏡醫師的眼動注意力數據（注視點）用作提示，這可以提高 PPSN 分割息肉的準確性，并在現實場景中有效獲取提示數據。為了增強 PPSN 的穩定性，生成了非均勻點陣提示，以補償眼動追蹤過程中的幀丟失。此外，引入了一種基于分割一切模型（SAM）的數據增強方法，以豐富提示數據集并提高 PPSN 的適應性。實驗證明了 PPSM 的準確性和實時性。在四個數據集上進行的交叉訓練和交叉測試結果表明，PPSM 具有良好的泛化能力。基于研究結果，開發了具有早期癌癥實時輔助診斷功能的一次性電子內窺鏡和圖像處理器。

Method

方法

3.1. Overall architecture Fig. 1 is the schematic drawing of PPSM. The prompt-based polyp ?segmentation network processes the endoscope images guided by endoscopists’ ocular attention prompts. This process generates masks for ?potential lesion areas. Then, the masks are overlaid on the original ?endoscopic images for auxiliary diagnosis. The original endoscopic ?images and the auxiliary diagnosis images are displayed on two medical displayers. Specifically, the PPSM is divided into four parts: data ?acquisition, data augmentation, polyp segmentation, and auxiliary diagnosis. In the first part, there is a disposable electronic endoscope for ?capturing images, an eye tracker for capturing the endoscopist’s ocular ?data, and public datasets. In the second part, there is prompt data ?augmentation, real-time prompt strategy, prompts, and an endoscope ?mainframe (image processor). In the third part, there is a prompt-based ?polyp segmentation network (PPSN). In the fourth part, the original ?endoscope image and the auxiliary diagnosis image are displayed on ?two monitors to assist endoscopists. Algorithms and software are all ?integrated into the endoscope mainframe. PPSN is the core of PPSM. ?Prompts, prompt data augmentation, real-time prompt strategy, and ?hardware devices assist PPSN in segmenting polyps. Prompts can be ?divided into training prompts and diagnostic prompts to complete the ?corresponding tasks. Training prompts such as circumcircles, polygons, ?scribbles, and points are generated by the publicly available datasets. ?To enhance the diversity of training prompts, a SAM-based augmentation method is proposed. This augmentation method employs masks ?generated by SAM as a novel type of prompt. In practical applications, ?endoscopists’ ocular attention or non-uniform dot matrices generated ?based on the probability of polyps appearing are employed as prompts. ?The disposable electronic endoscope, image processor, and eye-tracking ?device (eye tracker) with clinical utility are developed. Fig. 2 illustrates ?the overall architecture of the PPSM, with yellow denoting prompts, ?purple representing devices, blue signifying modules, red representing ?output results, and green&gray representing algorithms?strategies. 3.2. Prompt-based polyp segmentation network The proposed PPSN consists of the PEM, the FEEM, and the MDM. ?The PEM receives prompts related to polyps as inputs. It encodes ?prompts and transmits this information to the FEEM and the MDM.

3.1 總體架構 ? 圖1是PPSM的示意圖。基于提示的息肉分割網絡在內鏡醫師眼動注意力提示的引導下處理內窺鏡圖像，生成潛在病變區域的掩碼。隨后，這些掩碼被疊加到原始內窺鏡圖像上，用于輔助診斷。原始內窺鏡圖像和輔助診斷圖像分別顯示在兩臺醫用顯示器上。 ? 具體而言，PPSM分為四個部分：數據采集、數據增強、息肉分割和輔助診斷。 ? 1. 數據采集：包括用于捕獲圖像的一次性電子內窺鏡、捕獲內鏡醫師眼動數據的眼動追蹤器，以及公共數據集。 ? 2. 數據增強：包含提示數據增強、實時提示策略、提示生成模塊和內窺鏡主機（圖像處理器）。 ? 3. 息肉分割：核心為基于提示的息肉分割網絡（PPSN）。 ? 4. 輔助診斷：將原始內窺鏡圖像和輔助診斷圖像顯示在兩臺顯示器上，協助內鏡醫師判斷。 ? 算法和軟件均集成至內窺鏡主機中。PPSN是PPSM的核心，提示、提示數據增強、實時提示策略和硬件設備共同輔助PPSN完成息肉分割。提示分為訓練提示和診斷提示： ? - 訓練提示（如外接圓、多邊形、涂鴉、點）由公開數據集生成； ? - 為增強多樣性，引入基于SAM的數據增強方法，將SAM生成的掩碼作為新型提示。 ? - 實際應用中，采用內鏡醫師的眼動注意力或基于息肉出現概率生成的非均勻點陣作為診斷提示。 ? 圖2展示了PPSM的整體架構，其中黃色代表提示，紫色為設備，藍色為模塊，紅色為輸出結果，綠色和灰色為算法/策略。 ? ?### 3.2 基于提示的息肉分割網絡 ? 所提出的PPSN由提示編碼模塊（PEM）、特征提取編碼模塊（FEEM）和掩碼解碼模塊（MDM）組成： ? - PEM：接收與息肉相關的提示輸入，對其進行編碼后將信息傳遞給FEEM和MDM，引導后續處理。 ? - FEEM：以原始內窺鏡圖像為輸入，結合PEM的提示提取病變特征。 ? - MDM：根據FEEM提取的特征和PEM的提示，有針對性地生成息肉掩碼。 ? 三者通過簡潔的信息交互協同工作，確保特征提取和掩碼生成過程始終圍繞提示導向，提升分割的準確性和效率。

Conclusion

結論

This paper has developed a prompt-based polyp segmentationmethod that demonstrates promising performance in polyp segmentation. Suspicious lesion areas tend to draw more attention fromendoscopists. So the proposed PPSM takes the endoscopist’s ocularattention, non-uniform dot matrix, or polyp features as prompts toassist in early-stage cancer diagnosis during endoscopy. To a certainextent, it addresses the subjectivity of endoscopist-only diagnosis andthe missegmentation issues of AI-only approaches. The proposed PPSNachieves excellent performance and strong adaptability on four datasetssince the PEM encodes the prompts, guiding the FEEM to extractfeatures in a targeted manner and instructing the MDM to generatemasks directionally. Endoscopists’ ocular attention data and the nonuniform dot matrix prompts are incorporated as inputs to the PEM,this addresses the challenge of insufficient prompt data in real-worldscenarios and enhances the stability of practical applications. The dataaugmentation method based on the SAM enriches the prompt dataset.It enhances adaptability and generalization of PPSN. Based on theabove research results, a disposable electronic endoscope with thereal-time auxiliary diagnosis function for early cancer and an imageprocessor have been developed for endoscopy. During the course ofthe experiment, the movement of endoscopists during actual endoscopyreduces the effectiveness of our eye tracking, which in turn affects theperformance of the PPSN. Therefore, developing more stable eye trackers and better eye tracking algorithms is a key direction for optimizingthe PPSM. We plan to extend the capabilities of the PPSM to othermedical tasks, such as assisting the doctors in the detection of cervicalintraepithelial neoplasia, carcinoma of urinary bladder, and respiratorytract lesions. Additionally, we are developing disposable cell (microscopic) colposcopes, disposable electronic bronchoendoscopes, anddisposable electronic pyeloscopes. In the future, using text promptsto train networks with diagnostic capabilities similar to endoscopistsis a promising direction. This method could assist in analyzing lesionlocations and providing accurate diagnoses. Furthermore, recording thepatient’s preoperative information, intraoperative video, postoperativeoutcomes, and follow-up results, and generating a retrospective reportare highly significant in the medical field.

本文提出了一種基于提示的息肉分割方法，該方法在息肉分割任務中表現出了良好的性能。可疑病變區域往往更能吸引內鏡醫師的注意力，因此所提出的PPSM（提示引導息肉分割模型）將內鏡醫師的眼動注意力、非均勻點陣或息肉特征作為提示，輔助內鏡檢查中的早期癌癥診斷。在一定程度上，該方法解決了單純依靠內鏡醫師診斷的主觀性問題，以及僅使用人工智能方法可能出現的誤分割問題。 ? ### 核心方法與性能優勢 ? - PPSN模型架構：通過提示編碼模塊（PEM）對輸入提示進行編碼，引導特征增強與提取模塊（FEEM）有針對性地提取特征，并指導掩碼生成模塊（MDM）定向生成分割掩碼。這一設計使PPSN在四個數據集上均表現出優異的性能和強適應性。 ? - 提示數據融合：將內鏡醫師的眼動注意力數據與非均勻點陣提示作為PEM的輸入，解決了現實場景中提示數據不足的挑戰，增強了模型在實際應用中的穩定性。 ? - 數據增強策略：基于SAM（ Segment Anything Model）的掩碼生成方法豐富了提示數據集，進一步提升了PPSN的適應性和泛化能力。 ? ### 硬件開發與臨床應用 ? - 開發了具有早期癌癥實時輔助診斷功能的一次性電子內窺鏡及配套圖像處理器。內鏡檢查過程中，內鏡醫師的操作移動會降低眼動追蹤的有效性，進而影響PPSN性能。因此，研發更穩定的眼動追蹤設備和算法是優化PPSM的關鍵方向。 ? ### 未來研究方向 ? 1. 任務擴展：計劃將PPSM的能力拓展至其他醫療任務，如輔助檢測宮頸上皮內瘤變、膀胱癌、呼吸道病變等。 ? 2. 設備開發：正在研發一次性細胞（顯微）陰道鏡、一次性電子支氣管鏡和一次性電子腎盂鏡。 ? 3. 技術升級：探索使用文本提示訓練具有類似內鏡醫師診斷能力的網絡，輔助分析病變位置并提供準確診斷。 ? 4. 數據整合：記錄患者的術前信息、術中視頻、術后結果和隨訪數據，生成回顧性報告，這在醫療領域具有重要意義。 ? ?總結 ? 本文提出的基于提示的分割方法為內鏡檢查中的自動化病變檢測提供了新范式，結合眼動數據與AI模型的優勢，有望提升早期癌癥診斷的準確性和效率。未來研究將進一步優化硬件與算法，并拓展至更廣泛的醫療場景。

Figure

圖

Fig. 1. Schematic drawing. The prompt-based polyp segmentation network processes the endoscope images guided by endoscopists’ ocular attention prompts. This process generates ?masks for potential lesion areas. Then, the masks are overlaid on the original endoscopic images for auxiliary diagnosis. The original endoscopic images and the auxiliary diagnosis ?images are displayed on two medical displayers

圖1. 示意圖。基于提示的息肉分割網絡在內鏡醫師眼動注意力提示的引導下處理內窺鏡圖像，生成潛在病變區域的掩碼。隨后，這些掩碼被疊加到原始內窺鏡圖像上用于輔助診斷。原始內窺鏡圖像和輔助診斷圖像分別顯示在兩臺醫用顯示器上。

Fig. 2. The architecture of the proposed prompt-based polyp segmentation method. Yellow denotes prompts, purple represents devices, blue signifies modules, red represents output ?results, and green&gray represent algorithms?strategies. Specifically, the PPSM is divided into four parts: data acquisition, data augmentation, polyp segmentation, and auxiliary ?diagnosis. In the first part, there is a disposable electronic endoscope for capturing images, an eye tracker for capturing the endoscopist’s ocular data, and public datasets. In the ?second part, there is prompt data augmentation, real-time prompt strategy, prompts, and an endoscope mainframe (image processor). In the third part, there is a prompt-based ?polyp segmentation network (PPSN). In the fourth part, the original endoscope image and the auxiliary diagnosis image are displayed on two monitors to assist endoscopists. ?Algorithms and software are all integrated into the endoscope mainframe. PPSN is the core of PPSM. Prompts, prompt data augmentation, real-time prompt strategy, and hardware ?devices assist PPSN in segmenting polyps. Prompts can be divided into training prompts and diagnostic prompts to complete the corresponding tasks. Training prompts such as ?circumcircles, polygons, scribbles, and points are generated by the publicly available datasets. To enhance the diversity of training prompts, a SAM-based augmentation method ?is proposed. This augmentation method employs masks generated by SAM as a novel type of prompt. In practical applications, endoscopists’ ocular attention or non-uniform dot ?matrices generated based on the probability of polyps appearing are employed as prompts. The disposable electronic endoscope, image processor, and eye-tracking device (eye ?tracker) with clinical utility are developed.

圖2. 所提出的基于提示的息肉分割方法架構圖。黃色表示提示，紫色代表設備，藍色象征模塊，紅色代表輸出結果，綠色和灰色代表算法/策略。具體而言，PPSM分為四個部分：數據采集、數據增強、息肉分割和輔助診斷。第一部分包括用于捕獲圖像的一次性電子內窺鏡、用于捕獲內鏡醫師眼動數據的眼動追蹤器以及公共數據集。第二部分包含提示數據增強、實時提示策略、提示和內窺鏡主機（圖像處理器）。第三部分為基于提示的息肉分割網絡（PPSN）。第四部分將原始內窺鏡圖像和輔助診斷圖像顯示在兩臺顯示器上，以協助內鏡醫師。算法和軟件均集成到內窺鏡主機中。PPSN是PPSM的核心，提示、提示數據增強、實時提示策略和硬件設備輔助PPSN進行息肉分割。提示可分為訓練提示和診斷提示以完成相應任務，訓練提示（如外接圓、多邊形、涂鴉和點）由公開可用的數據集生成。為增強訓練提示的多樣性，提出了一種基于SAM的數據增強方法，該方法將SAM生成的掩碼用作新型提示。在實際應用中，采用內鏡醫師的眼動注意力或基于息肉出現概率生成的非均勻點陣作為提示。此外，還開發了具有臨床實用性的一次性電子內窺鏡、圖像處理器和眼動追蹤設備。

Fig. 3. The framework of the prompt polyp segmentation network. The proposed PPSN consists of the PEM, the FEEM, and the MDM. The PEM receives prompts related to polyps ?as inputs. It encodes prompts and transmits this information to the FEEM and the MDM. There are uncomplicated information interactions among the PEM, the FEEM, and the ?MDM. The FEEM takes the original images captured by the endoscope as its input. It extracts features from the original image with the prompts from the PEM. Finally, the MDM ?purposefully generates the masks according to the features from the FEEM and the prompts from the PEM. The PEM is designed to guide the FEEM towards purposeful feature ?extraction and the MDM towards mask generation.

圖3. 提示息肉分割網絡框架圖。所提出的PPSN由PEM（提示編碼模塊）、FEEM（特征提取編碼模塊）和MDM（掩碼解碼模塊）組成。PEM接收與息肉相關的提示作為輸入，對提示進行編碼后將信息傳遞給FEEM和MDM。PEM、FEEM和MDM之間存在簡潔的信息交互。FEEM以內窺鏡捕獲的原始圖像為輸入，結合PEM提供的提示從原始圖像中提取特征。最后，MDM根據FEEM提取的特征和PEM的提示有針對性地生成掩碼。PEM的設計旨在引導FEEM進行有目的的特征提取，并引導MDM生成掩碼。

Fig. 4. The framework of the prompt encoding module. The input of the PEM is ?a prompt tensor with dimensions ?? × ?? × ?? = 352 × 352 × 5, specifically related ?to polyps. The PEM consists of convolutional layers, normalization process layers, ?activation function layers, and pooling layers. The outputs at different layers of the ?PEM are ??1 , ??2?, and ??3 , and transmitted to the FEEM and the MDM.

圖4. 提示編碼模塊（PEM）框架圖 ? PEM的輸入為與息肉相關的提示張量，尺寸為??×??×??=352×352×5。該模塊由卷積層、歸一化層、激活函數層和池化層組成。PEM不同層級的輸出分別為??1、??2和??3，并傳輸至特征增強與提取模塊（FEEM）和掩碼生成模塊（MDM）。

Fig. 5. The final state of the prompt dataset. The dataset consists of two parts: training ?and practice. The prompts for the PEM differ across stages. The training part includes ?circumcircle prompt, polygon prompt, scribble prompt, point prompt and SAM’s mask ?result prompts. The SAM’s mask result prompts are generated according to SAM results ?and the criteria of Table 1. Other prompts are generated according to ground truth. The ?practical part consists of prompts generated based on endoscopists’ ocular attention.

圖5. 提示數據集的最終狀態。該數據集分為訓練和實際應用兩部分，不同階段用于PEM（提示編碼模塊）的提示類型不同： ? - 訓練部分：包括外接圓提示、多邊形提示、涂鴉提示、點提示和SAM掩碼結果提示。其中，SAM掩碼結果提示根據SAM分割結果和表1的篩選標準生成，其他提示基于數據集標注真值（ground truth）生成。 ? - 實際應用部分：由內鏡醫師的眼動注意力數據生成的提示構成，直接用于實時診斷。

Fig. 6. The types of prompts. Original: Original endoscopic image. (a): Circumcircle ?prompt. (b): Polygon prompt. (c): Scribble prompt. (d): Point prompt. (e): SAM’s mask ?result prompt. (f): Endoscopists’ ocular attention prompt. (g): Non-uniform dot matrix ?prompt.

?圖6. 提示類型示例圖。 ? - 原始圖：原始內窺鏡圖像。 ? - (a) 外接圓提示：圍繞息肉的外接圓，引入隨機函數增加多樣性。 ? - (b) 多邊形提示：息肉的最大包圍多邊形，添加隨機偏移使頂點和邊緣更隨機。 ? - (c) 涂鴉提示：用曲線勾勒息肉前景（內部）和背景（外部），區分病變與正常組織。 ? - (d) 點提示：息肉內部任意點（非中心點），指示可疑區域核心。 ? - (e) SAM掩碼結果提示：通過SAM模型生成的掩碼轉換為提示，包含正負樣本（如息肉內/外區域）。 ? - (f) 內鏡醫師眼動注意力提示：實時捕捉醫師注視點，映射為圖像中的關注區域。 ? - (g) 非均勻點陣提示：基于息肉出現概率分布生成的點陣，補償眼動追蹤中的幀丟失，增強穩定性。

Fig. 7. The framework of the feature extraction encoding module. The input of the ?FEEM is an original endoscopic image tensor with dimensions ?? ×?? ×?? = 352×352×3. ?The FEEM comprises convolutional layers, normalization process layers, activation ?function layers, and pooling layers. Outputs from different layers of the PEM, ??1 , ??2 , ?and ??3 , are concatenated at various layers of the FEEM. The output of the FEEM is ?then transmitted to the MDM.

圖7. 特征提取編碼模塊框架圖。FEEM的輸入是尺寸為??×??×??=352×352×3的原始內窺鏡圖像張量。該模塊由卷積層、歸一化處理層、激活函數層和池化層組成。來自PEM不同層的輸出??1、??2和??3，會在FEEM的各層中與圖像特征圖進行拼接融合。FEEM最終提取的特征張量將傳輸至MDM（掩碼解碼模塊）用于后續處理。

Fig. 8. The framework of the mask decoder module. The input of the MDM is the ?output of the FEEM. The input is first convolved and then concatenated with ??3 from ?the PEM, followed by further convolution, upsampling, and an additional convolution. ?The intermediate output is substituted by then combined with ??2 using the same ?process. The upsampling operation, which is applied after concatenation with ??1 , is ?substituted by a convolution operation to generate the mask. The output of the MDM ?is a mask.

圖8. 掩碼解碼模塊框架圖。MDM的輸入為FEEM的輸出特征張量。輸入首先經過卷積處理，然后與PEM的??3輸出進行拼接，隨后進一步卷積和上采樣，再通過額外的卷積操作生成中間結果。接著，以相同流程將中間輸出與PEM的??2拼接并處理。最后，在與PEM的??1拼接后，通過卷積操作（而非傳統上采樣）生成最終掩碼。MDM的輸出即為息肉分割掩碼。

Fig. 9. Segmentation results of SAM on landscape images. The SAM showcases its ?excellent capability to segment objects in natural landscape images. While the SAM ?cannot be directly applied to endoscopic scenes, the masks it produces can serve as a ?novel type of prompt. A SAM-Based data augmentation method is proposed to enrich ?the prompt dataset.

圖9. SAM在自然景觀圖像上的分割結果。SAM展示了其在自然景觀圖像中分割物體的卓越能力。盡管SAM無法直接應用于內窺鏡場景，但其生成的掩碼可作為一種新型提示。為此，研究提出了一種基于SAM的數據增強方法，以豐富提示數據集。

Fig. 11. The areas that endoscopists focus on when viewing the endoscopic images. ?The first row images are original endoscopic images. The images in the second row ?highlight the areas that endoscopists focus on.

圖11. 內鏡醫師觀察內窺鏡圖像時的關注區域示意圖 ? 第一行：原始內窺鏡圖像； ? 第二行：高亮顯示內鏡醫師的眼動關注區域（如息肉或可疑病變部位）。 ?

Fig. 12. The hardware equipment used for the experiment. A disposable electronic ?endoscope, an image processor, and an eye-tracking device (eye tracker) are developed. ?The disposable electronic endoscope collects image signals and transmits them to the ?image processor. The eye tracker captures the endoscopist’s ocular data and transmits ?them to the image processor. The image processor processes the image signals and the ?endoscopist’s ocular data, and transmits the original endoscopic image and the auxiliary ?image to two displays, respectively.

圖12. 實驗用硬件設備圖。開發了一次性電子內窺鏡、圖像處理器和眼動追蹤設備（眼動儀）。一次性電子內窺鏡采集圖像信號并傳輸至圖像處理器，眼動儀捕獲內鏡醫師的眼動數據并傳輸至圖像處理器。圖像處理器處理圖像信號和眼動數據后，將原始內窺鏡圖像和輔助診斷圖像分別傳輸至兩臺顯示器。

Table

表

Table 1 Screening criteria for SAM-generated masks. The SAM-generated masks are divided into ?four types: (1) inside the GT, (2) outside the GT, (3) across the GT, and (4) containing ?and contained. The first and second types of masks are retained. The third type of ?masks are deleted. The internal masks of fourth type are retained.

?表1. SAM生成掩碼的篩選標準 ? SAM生成的掩碼分為四種類型：(1) 位于真值（GT）內，(2) 位于真值外，(3) 跨越真值，(4) 包含與被包含關系。 ? - 保留：類型(1)和(2)的掩碼（完全在真值內或外）。 ? - 刪除：類型(3)的掩碼（同時包含真值和背景區域，易引入混淆）。 ? - 處理類型(4)：保留內部掩碼，刪除外部掩碼（確保提示的準確性）。 ??

Table 2 The dot matrix prompts. The dot matrix prompts are generated according to the ?frequency of polyps to enhance the network’s stability in the absence of endoscopist’s ?ocular data. The first row images are heatmaps of the frequency of polyps in different ?datasets. The pictures in the other rows represent the dot matrix prompts generated at ?different sampling frequencies.

表2. 點陣提示 ? 點陣提示根據息肉出現頻率生成，以增強在缺乏內鏡醫師眼動數據時網絡的穩定性。第一行圖像為不同數據集中息肉出現頻率的熱圖，其他行圖片為不同采樣頻率下生成的點陣提示。

Table 3 Statistics and characteristics of the enhanced datasets. ?: Include, ?: Generate. Due to the varying image sizes across the datasets, all images are resized to 400 × 400 pixels, and ?the pixel data are subsequently processed and analyzed.

表3. 增強數據集的統計信息與特性 ? ?：包含，?：生成。由于各數據集的圖像尺寸不同，所有圖像均調整為400×400像素，隨后對像素數據進行處理和分析。