源碼解析（二）：nnUNet

原文

系統框架

外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳

實驗規劃：分析數據集屬性并生成管道配置
預處理：準備訓練數據集（規范化、重采樣等）
模型訓練：使用配置的設置訓練模型
模型評估：計算訓練模型的性能指標
最佳配置選擇：確定最佳模型或集成
推理：將選定的模型應用于新數據

安裝

硬件要求
- CPU
  - 現代多核處理器
- 內存
  - 最低：16GB
  - 建議：32GB 或更多（特別是對于包含大圖像的 3D 數據集）
- 圖形處理器
  - 用于訓練：NVIDIA GPU，至少具有 11GB VRAM（RTX 2080 Ti、3090、4090、A5000 或更高版本）
  - 僅用于推理：配備 8GB+ VRAM 的 NVIDIA GPU，或僅使用 CPU（速度明顯較慢）
- 存儲空間
  - 至少 100GB 可用空間（根據數據集大小而變化）
軟件要求
- Python
  - 3.10 或更高版本
- 操作系統
  - Linux（推薦，尤其是 Ubuntu）
  - Windows 10/11
  - macOS（通過 MPS 或僅 CPU 提供有限的 GPU 支持）
- CUDA 和 cuDNN
  - GPU 加速所需（兼容 PyTorch 2.1.2+）
安裝方法

通過pip安裝

安裝 nnU-Net v2 最簡單的方法是使用 pip：
```
pip install nnunetv2
```
這將自動安裝 nnU-Net v2 及其所有依賴項，如pyproject.toml32-55文件。

從 GitHub 倉庫安裝

對于最新的開發版本或者如果您想為代碼庫做出貢獻：
```
git clone https://github.com/MIC-DKFZ/nnUNet
cd nnUNet
pip install -e .
```

核心模塊代碼解讀

1.訓練數據預處理

nnU-Net v2 中的預處理系統將原始醫學影像數據轉換為適用于神經網絡訓練和推理的標準化輸入。

預處理系統的核心是**DefaultPreprocessor**類，它協調所有預處理操作。

代碼文件為：preprocessing/preprocessors/default_preprocessor.py

a.圖像加載與轉置
第一步使用計劃中指定的讀取器加載圖像并應用軸轉置以確保方向一致：

data, data_properties = rw.read_images(image_files)
data = data.transpose([0, *[i + 1 for i in plans_manager.transpose_forward]])

b.圖像裁減

裁剪通過刪除沒有相關信息的背景區域來減少內存需求,nnU-Net 記錄用于裁剪的邊界框，以便在推理過程中進行逆轉：

shape_before_cropping = data.shape[1:]
properties['shape_before_cropping'] = shape_before_cropping
data, seg, bbox = crop_to_nonzero(data, seg)
properties['bbox_used_for_cropping'] = bbox

c.正則化

歸一化使圖像間的強度值標準化，使網絡訓練更加穩定。在重采樣之前應用歸一化，以確保插值的準確性：

data = self._normalize(data, seg, configuration_manager,plans_manager.foreground_intensity_properties_per_channel)

d.重采樣（各向異性處理）

重采樣將體素之間的間距調整為配置中指定的目標間距：

重采樣過程包括：

根據原始和目標間距計算新形狀
應用適當的插值（圖像和分割不同）

# /preprocessing/resampling/default_resampling.pynew_shape = compute_new_shape(data.shape[1:], original_spacing, target_spacing)
data = configuration_manager.resampling_fn_data(data, new_shape, original_spacing, target_spacing)
seg = configuration_manager.resampling_fn_seg(seg, new_shape, original_spacing, target_spacing)

e. 前景采樣以提高訓練效率

對于分割任務，nnU-Net 對前景位置進行采樣，以通過平衡的塊采樣實現高效的訓練：

properties['class_locations'] = self._sample_foreground_locations(seg, collect_for_this, verbose=self.verbose)

2.推理預處理

在推理過程中，預處理是即時執行的，流程如下：

外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳

3.用戶單個case的圖像預處理

preprocessor = DefaultPreprocessor()
data, seg, properties = preprocessor.run_case(input_images, seg_file, plans_manager, configuration_manager, dataset_json
)

實驗自動化設置

設置的理念

nnUNet “無配置”理念指根據數據集屬性自動設計和配置網絡架構，平衡性能和硬件限制。該部分負責根據數據集特征自動配置神經網絡架構和訓練參數。實驗規劃系統會分析數據集屬性和硬件約束，從而生成預處理、訓練和推理的最佳設置。

實驗自動化分析數據集特征以確定：

重采樣的目標間距
網絡架構和拓撲（池化操作、內核大小）
內存高效的patch和batch size的大小
適當的數據增強和預處理策略

代碼文件位置：

nnunetv2/experiment_planning/experiment_planners/default_experiment_planner.py

**文件中ExperimentPlanner**幾個主要職責：

讀取數據集屬性——分析數據集指紋以了解圖像特征，如形狀、間距和強度分布。
網絡架構配置——根據數據集屬性選擇適當的網絡深度、內核大小和特征圖。
硬件感知優化——它估計 GPU 內存需求并調整補丁和批次大小以適應可用資源。
多配置規劃它為 2D、3D 全分辨率和 3D 低分辨率方法創建配置計劃。

設置的類型

配置	維度	解決	用例
2d	2D	全分辨率	訓練速度快，適合高度各向異性的數據
3d_全分辨率	3D	全分辨率	適合中等大小的 3D 體積
3d_lowres	3D	分辨率降低	對于非常大的 3D 體積
3d_cascade_fullres	3D	全分辨率	3d_lowres 之后的第二階段，用于大型數據集

系統還會根據GPU的現存自動化設計ResEncUnet的參數大小

系統規劃	目標 GPU 內存	用例
nnUNetPlannerResEncM	8 GB	RTX 2080Ti、1080Ti等
nnUNetPlannerResEncL	24 GB	RTX 3090、RTX 4090、A5000
nnUNetPlannerResEncXL	40 GB	A100 40GB、A6000等

模型的訓練

訓練框架

訓練模塊是 nnU-Net 的核心組件，負責模型訓練，協調從數據加載到模型驗證的整個過程

代碼文件位置：

nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py

訓練系統的核心是**nnUNetTrainer類**，它協調整個訓練過程。它提供了一個靈活的框架，可以擴展以適應不同的訓練策略。在

外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳

主要職責：

初始化和管理網絡架構
配置優化器和損失函數
管理數據加載器和增強管道
執行訓練和驗證循環
實施檢查點和日志記錄

和模型訓練相關的文件都在nnUNetTrainer文件夾下面，比如優化器，網絡結構定義，訓練流程等。

模型結構

網絡架構根據計劃和配置動態構建。默認架構類似 U-Net，但可以自定義，該方法**build_network_architecture**負責按照計劃構建適當的網絡架構。

# Implementation in nnUNetTrainer class
def build_network_architecture(architecture_class_name: str,arch_init_kwargs: dict,arch_init_kwargs_req_import: Union[List[str], Tuple[str, ...]],num_input_channels: int,num_output_channels: int,enable_deep_supervision: bool = True) -> nn.Module:

數據增強

nnU-Net 采用大量數據增強來提高模型泛化能力：

增強類型	示例
空間	旋轉、縮放、彈性變形、鏡像
強度	亮度、對比度、伽馬校正
噪音	高斯噪聲、高斯模糊
其他	模擬低分辨率、隨機二進制算子

增強管道是根據數據集屬性配置的：

def get_training_transforms(patch_size, rotation_for_DA, deep_supervision_scales, mirror_axes, do_dummy_2d_data_aug, use_mask_for_norm, is_cascaded, foreground_labels, regions, ignore_label):# Configures the training augmentation pipeline

損失函數

nnU-Net 結合使用了 Dice 損失和交叉熵（或基于區域的分割的 BCE），損失函數是根據任務類型（基于標簽或基于區域的分割）以及是否啟用深度監督來構建的

def _build_loss(self):if self.label_manager.has_regions:loss = DC_and_BCE_loss({},{'batch_dice': self.configuration_manager.batch_dice,'do_bg': True, 'smooth': 1e-5, 'ddp': self.is_ddp},use_ignore_label=self.label_manager.ignore_label is not None,dice_class=MemoryEfficientSoftDiceLoss)else:loss = DC_and_CE_loss({'batch_dice': self.configuration_manager.batch_dice,'smooth': 1e-5, 'do_bg': False, 'ddp': self.is_ddp}, {}, weight_ce=1, weight_dice=1,ignore_label=self.label_manager.ignore_label, dice_class=MemoryEfficientSoftDiceLoss)if self._do_i_compile():loss.dc = torch.compile(loss.dc)# we give each output a weight which decreases exponentially (division by 2) as the resolution decreases# this gives higher resolution outputs more weight in the lossif self.enable_deep_supervision:deep_supervision_scales = self._get_deep_supervision_scales()weights = np.array([1 / (2 ** i) for i in range(len(deep_supervision_scales))])if self.is_ddp and not self._do_i_compile():# very strange and stupid interaction. DDP crashes and complains about unused parameters due to# weights[-1] = 0. Interestingly this crash doesn't happen with torch.compile enabled. Strange stuff.# Anywho, the simple fix is to set a very low weight to this.weights[-1] = 1e-6else:weights[-1] = 0# we don't use the lowest 2 outputs. Normalize weights so that they sum to 1weights = weights / weights.sum()# now wrap the lossloss = DeepSupervisionWrapper(loss, weights)return loss

超參設置

默認情況下，nnU-Net 使用帶有 Nesterov 動量的 SGD 進行優化和多項式學習率衰減：

def configure_optimizers(self):optimizer = torch.optim.SGD(self.network.parameters(), self.initial_lr, weight_decay=self.weight_decay,momentum=0.99, nesterov=True)lr_scheduler = PolyLRScheduler(optimizer, self.initial_lr, self.num_epochs)return optimizer, lr_scheduler

可通過訓練器變體獲得替代優化器：

nnUNetTrainerAdam：使用 Adam 或 AdamW 優化器
nnUNetTrainerAdan：使用Adan優化器（需要安裝adan-pytorch）

多卡訓練

nnU-Net 支持使用 PyTorch 的 DistributedDataParallel (DDP) 進行分布式訓練，當使用多個 GPU 時，批量大小分布在各個工作器上，并且系統處理跨設備的梯度同步。

多種訓練方法

nnU-Net 提供了基礎訓練器的幾種變體，以支持不同的用例：

訓練變體	目的
nnUNetTrainer無深度監督	缺乏深度監督的訓練
nnUNetTrainerAdan	使用 Adan 優化器
nnUNetTrainerAdam	使用 Adam 優化器
nnUNetTrainer_Xepochs	訓練指定數量的 epoch
nnUNetTrainerBenchmark_5epochs	用于基準性能

變體系統允許輕松定制，而無需修改核心訓練器，比如下面的變體，直接用True或者False設置即可：

class nnUNetTrainerNoDeepSupervision(nnUNetTrainer):def __init__(self, plans, configuration, fold, dataset_json, device):super().__init__(plans, configuration, fold, dataset_json, device)self.enable_deep_supervision = False

開始訓練

訓練過程通常通過命令行啟動：

nnUNetv2_train DATASET_NAME_OR_ID CONFIGURATION FOLD [-tr TRAINER] [-p PLANS][-pretrained_weights PATH] [-num_gpus NUM] [--npz] [--c] [--val][--val_best] [--disable_checkpointing] [-device DEVICE]

關鍵參數：

DATASET_NAME_OR_ID：用于訓練的數據集
CONFIGURATION：要使用的配置（例如，“2d”、“3d_fullres”）
FOLD：交叉驗證倍數（0-4 或“全部”）
tr：自定義訓練器類（默認值：‘nnUNetTrainer’）
p：計劃標識符（默認值：‘nnUNetPlans’）
num_gpus：用于訓練的 GPU 數量

模型的推理

nnU-Net 推理系統負責應用已訓練的模型對新的醫學圖像進行預測，將原始輸入數據轉換為精確的分割圖。

推理系統的核心類是**nnUNetPredictor**，它協調整個預測過程。它管理：

模型初始化——加載網絡架構和權重
預處理協調——確保正確的圖像準備
預測執行——運行滑動窗口算法
結果處理——將邏輯轉換為最終分割

初始化配置

使用前必須**nnUNetPredictor**進行初始化，一般使用以下參數：

范圍	默認	描述
`tile_step_size`	0.5	移動滑動窗口的量（0.5 = 50％重疊）
`use_gaussian`	True	是否應用高斯加權進行窗口混合
`use_mirroring`	True	是否通過鏡像使用測試時間增強
`perform_everything_on_device`	True	處理期間是否將數據保留在 GPU 上
`device`	CUDA	計算設備（推薦使用 CUDA）

初始化預測器后，必須使用經過訓練的模型對其進行配置：

predictor.initialize_from_trained_model_folder(model_folder,  # Path to trained model folderuse_folds=(0,),  # Which folds to use (can combine multiple)checkpoint_name='checkpoint_final.pth'  # Which checkpoint to use
)

推理方法

代碼文件位置:nnunetv2/inference/predict_from_raw_data.py

推理系統提供了多種預測方法，每種方法適用于不同的用例：

方法	用例	優勢	缺點
`predict_from_files()`	基于多個文件的圖像的批量預測	最佳內存效率，并行處理	需要磁盤上的文件
`predict_from_list_of_npy_arrays()`	多張圖片已作為數組加載	無需文件 I/O	更高的內存使用率
`predict_single_npy_array()`	單幅圖像預測	最簡單的 API	最慢，無并行化
`predict_from_data_iterator()`	自定義數據加載方案	最大的靈活性	更復雜的實現

推理系統的一個關鍵組件是滑動窗口預測機制，它可以處理可能無法一次性放入 GPU 內存的大型醫學圖像。

# nnunetv2/inference/sliding_window_prediction.py#L10-L54
def compute_gaussian(tile_size: Union[Tuple[int, ...], List[int]], sigma_scale: float = 1. / 8,value_scaling_factor: float = 1, dtype=torch.float16, device=torch.device('cuda', 0)) \-> torch.Tensor:tmp = np.zeros(tile_size)center_coords = [i // 2 for i in tile_size]sigmas = [i * sigma_scale for i in tile_size]tmp[tuple(center_coords)] = 1gaussian_importance_map = gaussian_filter(tmp, sigmas, 0, mode='constant', cval=0)gaussian_importance_map = torch.from_numpy(gaussian_importance_map)gaussian_importance_map /= (torch.max(gaussian_importance_map) / value_scaling_factor)gaussian_importance_map = gaussian_importance_map.to(device=device, dtype=dtype)# gaussian_importance_map cannot be 0, otherwise we may end up with nans!mask = gaussian_importance_map == 0gaussian_importance_map[mask] = torch.min(gaussian_importance_map[~mask])return gaussian_importance_mapdef compute_steps_for_sliding_window(image_size: Tuple[int, ...], tile_size: Tuple[int, ...], tile_step_size: float) -> \List[List[int]]:assert [i >= j for i, j in zip(image_size, tile_size)], "image size must be as large or larger than patch_size"assert 0 < tile_step_size <= 1, 'step_size must be larger than 0 and smaller or equal to 1'# our step width is patch_size*step_size at most, but can be narrower. For example if we have image size of# 110, patch size of 64 and step_size of 0.5, then we want to make 3 steps starting at coordinate 0, 23, 46target_step_sizes_in_voxels = [i * tile_step_size for i in tile_size]num_steps = [int(np.ceil((i - k) / j)) + 1 for i, j, k in zip(image_size, target_step_sizes_in_voxels, tile_size)]steps = []for dim in range(len(tile_size)):# the highest step value for this dimension ismax_step_value = image_size[dim] - tile_size[dim]if num_steps[dim] > 1:actual_step_size = max_step_value / (num_steps[dim] - 1)else:actual_step_size = 99999999999  # does not matter because there is only one step at 0steps_here = [int(np.round(actual_step_size * i)) for i in range(num_steps[dim])]steps.append(steps_here)return steps

多折交叉預測

推理系統通過集成平均支持多個網絡（通常來自不同的交叉驗證折疊）的預測：

使用多重折疊進行初始化：use_folds=(0, 1, 2, 3, 4)
對于每個輸入：
- 循環遍歷所有模型權重
- 從每個模型生成預測
- 平均預測

這通常比使用單折疊產生更穩健的結果，但代價是預測時間更長。集成平均發生在logits級別（softmax/argmax之前），從數學上講，這比平均分割更合理。

模型評估與最佳選擇

nnU-Net v2 中的評估和模型選擇系統提供了一個強大的框架，用于評估已訓練分割模型的性能、選擇最佳配置，并通過后處理和集成來改進結果。

評估指標

nnU-Net 的評估系統主要使用 Dice 系數作為主要性能指標，但也會計算：

Dice系數：測量預測和地面實況之間的空間重疊
交并比（IoU）：替代重疊度量
真正例（TP）假正例（FP）假負例（FN）真負例（TN）

模型最佳選擇

nnU-Net 為每個數據集訓練多個模型配置，并自動選擇性能最佳的模型。選擇過程包括評估單個模型和模型集成，以確定哪個模型能獲得最高的 Dice 分數。

默認情況下，nnU-Net 會考慮以下配置進行評估：

配置	描述
2d	二維U-Net
3d_全分辨率	全分辨率 3D U-Net
3d_lowres	低分辨率的 3D U-Net
3d_cascade_fullres	3D U-Net 級聯（低分辨率 → 全分辨率）

交叉驗證結果收集

在模型選擇之前，收集并合并所有交叉驗證的結果：

將各個折疊的所有驗證預測復制到統一文件夾中
根據事實評估收集到的預測
生成交叉驗證性能的摘要

# Example of accumulating cross-validation results
merged_output_folder = join(output_folder, f'crossval_results_folds_{folds_tuple_to_string(folds)}')
accumulate_cv_results(output_folder, merged_output_folder, folds, num_processes, overwrite)