基于卷積神經網絡與小波變換的醫學圖像超分辨率算法復現

前些天發現了一個巨牛的人工智能學習網站，通俗易懂，風趣幽默，忍不住分享一下給大家，覺得好請收藏。點擊跳轉到網站。

1. 引言

醫學圖像超分辨率技術在臨床診斷和治療規劃中具有重要意義。高分辨率的醫學圖像能夠提供更豐富的細節信息，幫助醫生做出更準確的診斷。近年來，深度學習技術在圖像超分辨率領域取得了顯著進展。本文將復現一種結合卷積神經網絡(CNN)、小波變換和自注意力機制的醫學圖像超分辨率算法。

2. 相關工作

2.1 傳統超分辨率方法

傳統的超分辨率方法主要包括基于插值的方法（如雙三次插值）、基于重建的方法和基于學習的方法。這些方法在醫學圖像處理中都有一定應用，但往往難以處理復雜的退化模型和保持圖像細節。

2.2 深度學習方法

近年來，基于深度學習的超分辨率方法取得了突破性進展。SRCNN首次將CNN應用于超分辨率任務，隨后出現了FSRCNN、ESPCN、VDSR等改進網絡。更先進的網絡如EDSR、RCAN等通過殘差學習和通道注意力機制進一步提升了性能。

2.3 小波變換在超分辨率中的應用

小波變換能夠將圖像分解為不同頻率的子帶，有利于分別處理高頻細節和低頻內容。一些研究將小波變換與深度學習結合，如Wavelet-SRNet、DWSR等，取得了不錯的效果。

2.4 自注意力機制

自注意力機制能夠捕捉圖像中的長距離依賴關系，在超分辨率任務中有助于恢復全局結構。一些工作如SAN、RNAN等將自注意力機制引入超分辨率網絡。

3. 方法設計

本文實現的網絡結構結合了CNN、小波變換和自注意力機制的優勢，整體架構如圖1所示。

3.1 網絡總體結構

網絡采用編碼器-解碼器結構，主要包含以下組件：

小波分解層：將輸入低分辨率圖像分解為多頻子帶
特征提取模塊：包含多個殘差小波注意力塊(RWAB)
自注意力模塊：捕捉全局依賴關系
小波重構層：從高頻子帶重建高分辨率圖像

3.2 殘差小波注意力塊(RWAB)

RWAB是網絡的核心模塊，結構如圖2所示，包含：

小波卷積層：使用小波變換進行特征提取
通道注意力機制：自適應調整各通道特征的重要性
殘差連接：緩解梯度消失問題

3.3 自注意力模塊

自注意力模塊計算所有位置的特征相關性，公式如下：

Attention(Q,K,V) = softmax(QK^T/√d)V

其中Q、K、V分別是通過線性變換得到的查詢、鍵和值矩陣，d是特征維度。

3.4 損失函數

采用L1損失和感知損失的組合：

L = λ1L1 + λ2Lperc

其中L1是像素級L1損失，Lperc是基于VGG特征的感知損失。

4. 代碼實現

4.1 環境配置

import torch
import torch.nn as nn
import torch.nn.functional as F
import pywt
import numpy as np
from torchvision.models import vgg19
from math import sqrtdevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

4.2 小波變換層實現

class DWT(nn.Module):def __init__(self):super(DWT, self).__init__()self.requires_grad = Falsedef forward(self, x):x01 = x[:, :, 0::2, :] / 2x02 = x[:, :, 1::2, :] / 2x1 = x01[:, :, :, 0::2]x2 = x02[:, :, :, 0::2]x3 = x01[:, :, :, 1::2]x4 = x02[:, :, :, 1::2]x_LL = x1 + x2 + x3 + x4x_HL = -x1 - x2 + x3 + x4x_LH = -x1 + x2 - x3 + x4x_HH = x1 - x2 - x3 + x4return torch.cat((x_LL, x_HL, x_LH, x_HH), 1)class IWT(nn.Module):def __init__(self):super(IWT, self).__init__()self.requires_grad = Falsedef forward(self, x):in_batch, in_channel, in_height, in_width = x.size()out_batch, out_channel, out_height, out_width = in_batch, int(in_channel / 4), 2 * in_height, 2 * in_widthx1 = x[:, 0:out_channel, :, :] / 2x2 = x[:, out_channel:out_channel * 2, :, :] / 2x3 = x[:, out_channel * 2:out_channel * 3, :, :] / 2x4 = x[:, out_channel * 3:out_channel * 4, :, :] / 2h = torch.zeros([out_batch, out_channel, out_height, out_width]).float().to(x.device)h[:, :, 0::2, 0::2] = x1 - x2 - x3 + x4h[:, :, 1::2, 0::2] = x1 - x2 + x3 - x4h[:, :, 0::2, 1::2] = x1 + x2 - x3 - x4h[:, :, 1::2, 1::2] = x1 + x2 + x3 + x4return h

4.3 通道注意力模塊

class ChannelAttention(nn.Module):def __init__(self, channel, reduction=16):super(ChannelAttention, self).__init__()self.avg_pool = nn.AdaptiveAvgPool2d(1)self.max_pool = nn.AdaptiveMaxPool2d(1)self.fc = nn.Sequential(nn.Linear(channel, channel // reduction),nn.ReLU(inplace=True),nn.Linear(channel // reduction, channel),nn.Sigmoid())def forward(self, x):b, c, _, _ = x.size()y_avg = self.avg_pool(x).view(b, c)y_max = self.max_pool(x).view(b, c)y_avg = self.fc(y_avg).view(b, c, 1, 1)y_max = self.fc(y_max).view(b, c, 1, 1)y = y_avg + y_maxreturn x * y.expand_as(x)

4.4 殘差小波注意力塊(RWAB)

class RWAB(nn.Module):def __init__(self, n_feats):super(RWAB, self).__init__()self.dwt = DWT()self.iwt = IWT()self.conv1 = nn.Conv2d(n_feats*4, n_feats*4, 3, 1, 1)self.conv2 = nn.Conv2d(n_feats*4, n_feats*4, 3, 1, 1)self.ca = ChannelAttention(n_feats*4)self.conv3 = nn.Conv2d(n_feats, n_feats, 3, 1, 1)def forward(self, x):residual = xx = self.dwt(x)x = self.conv1(x)x = F.relu(x)x = self.conv2(x)x = self.ca(x)x = self.iwt(x)x = self.conv3(x)x += residualreturn x

4.5 自注意力模塊

class SelfAttention(nn.Module):def __init__(self, in_dim):super(SelfAttention, self).__init__()self.query_conv = nn.Conv2d(in_dim, in_dim//8, 1)self.key_conv = nn.Conv2d(in_dim, in_dim//8, 1)self.value_conv = nn.Conv2d(in_dim, in_dim, 1)self.gamma = nn.Parameter(torch.zeros(1))self.softmax = nn.Softmax(dim=-1)def forward(self, x):batch, C, width, height = x.size()proj_query = self.query_conv(x).view(batch, -1, width*height).permute(0, 2, 1)proj_key = self.key_conv(x).view(batch, -1, width*height)energy = torch.bmm(proj_query, proj_key)attention = self.softmax(energy)proj_value = self.value_conv(x).view(batch, -1, width*height)out = torch.bmm(proj_value, attention.permute(0, 2, 1))out = out.view(batch, C, width, height)out = self.gamma * out + xreturn out

4.6 整體網絡結構

class WASA(nn.Module):def __init__(self, scale_factor=2, n_feats=64, n_blocks=16):super(WASA, self).__init__()self.scale_factor = scale_factor# Initial feature extractionself.head = nn.Conv2d(3, n_feats, 3, 1, 1)# Residual wavelet attention blocksself.body = nn.Sequential(*[RWAB(n_feats) for _ in range(n_blocks)])# Self-attention moduleself.sa = SelfAttention(n_feats)# Upsamplingif scale_factor == 2:self.upsample = nn.Sequential(nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, 3, 3, 1, 1))elif scale_factor == 4:self.upsample = nn.Sequential(nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, n_feats*4, 3, 1, 1),nn.PixelShuffle(2),nn.Conv2d(n_feats, 3, 3, 1, 1))# Skip connectionself.skip = nn.Sequential(nn.Conv2d(3, n_feats, 5, 1, 2),nn.Conv2d(n_feats, n_feats, 3, 1, 1),nn.Conv2d(n_feats, 3, 3, 1, 1))def forward(self, x):# Bicubic upsampling as inputx_up = F.interpolate(x, scale_factor=self.scale_factor, mode='bicubic', align_corners=False)# Main pathx = self.head(x)residual = xx = self.body(x)x = self.sa(x)x += residualx = self.upsample(x)# Skip connectionskip = self.skip(x_up)x += skipreturn x

4.7 損失函數實現

class PerceptualLoss(nn.Module):def __init__(self):super(PerceptualLoss, self).__init__()vgg = vgg19(pretrained=True).featuresself.vgg = nn.Sequential(*list(vgg.children())[:35]).eval()for param in self.vgg.parameters():param.requires_grad = Falseself.criterion = nn.L1Loss()def forward(self, x, y):x_vgg = self.vgg(x)y_vgg = self.vgg(y.detach())return self.criterion(x_vgg, y_vgg)class TotalLoss(nn.Module):def __init__(self):super(TotalLoss, self).__init__()self.l1_loss = nn.L1Loss()self.perceptual_loss = PerceptualLoss()def forward(self, pred, target):l1 = self.l1_loss(pred, target)perc = self.perceptual_loss(pred, target)return l1 + 0.1 * perc

4.8 訓練代碼

def train(model, train_loader, optimizer, criterion, epoch, device):model.train()total_loss = 0for batch_idx, (lr, hr) in enumerate(train_loader):lr, hr = lr.to(device), hr.to(device)optimizer.zero_grad()output = model(lr)loss = criterion(output, hr)loss.backward()optimizer.step()total_loss += loss.item()if batch_idx % 100 == 0:print(f'Train Epoch: {epoch} [{batch_idx * len(lr)}/{len(train_loader.dataset)} 'f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')avg_loss = total_loss / len(train_loader)print(f'====> Epoch: {epoch} Average loss: {avg_loss:.4f}')return avg_loss

4.9 測試代碼

def test(model, test_loader, criterion, device):model.eval()test_loss = 0psnr = 0with torch.no_grad():for lr, hr in test_loader:lr, hr = lr.to(device), hr.to(device)output = model(lr)test_loss += criterion(output, hr).item()psnr += calculate_psnr(output, hr)test_loss /= len(test_loader)psnr /= len(test_loader)print(f'====> Test set loss: {test_loss:.4f}, PSNR: {psnr:.2f}dB')return test_loss, psnrdef calculate_psnr(img1, img2):mse = torch.mean((img1 - img2) ** 2)if mse == 0:return float('inf')return 20 * torch.log10(1.0 / torch.sqrt(mse))

5. 實驗與結果

5.1 數據集準備

我們使用以下醫學圖像數據集進行訓練和測試：

IXI數據集(腦部MRI)
ChestX-ray8(胸部X光)
LUNA16(肺部CT)

class MedicalDataset(Dataset):def __init__(self, root_dir, scale=2, train=True, patch_size=64):self.root_dir = root_dirself.scale = scaleself.train = trainself.patch_size = patch_sizeself.image_files = [f for f in os.listdir(root_dir) if f.endswith('.png')]def __len__(self):return len(self.image_files)def __getitem__(self, idx):img_path = os.path.join(self.root_dir, self.image_files[idx])img = Image.open(img_path).convert('RGB')if self.train:# Random cropw, h = img.sizex = random.randint(0, w - self.patch_size)y = random.randint(0, h - self.patch_size)img = img.crop((x, y, x+self.patch_size, y+self.patch_size))# Random augmentationif random.random() < 0.5:img = img.transpose(Image.FLIP_LEFT_RIGHT)if random.random() < 0.5:img = img.transpose(Image.FLIP_TOP_BOTTOM)if random.random() < 0.5:img = img.rotate(90)# Downsample to create LR imagelr_size = (img.size[0] // self.scale, img.size[1] // self.scale)lr_img = img.resize(lr_size, Image.BICUBIC)# Convert to tensortransform = transforms.ToTensor()hr = transform(img)lr = transform(lr_img)return lr, hr

5.2 訓練配置

def main():# Hyperparametersscale = 2batch_size = 16epochs = 100lr = 1e-4n_feats = 64n_blocks = 16# Devicedevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')# Datasettrain_dataset = MedicalDataset('data/train', scale=scale, train=True)test_dataset = MedicalDataset('data/test', scale=scale, train=False)train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)# Modelmodel = WASA(scale_factor=scale, n_feats=n_feats, n_blocks=n_blocks).to(device)# Loss and optimizercriterion = TotalLoss().to(device)optimizer = torch.optim.Adam(model.parameters(), lr=lr)scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)# Training loopbest_psnr = 0for epoch in range(1, epochs+1):train_loss = train(model, train_loader, optimizer, criterion, epoch, device)test_loss, psnr = test(model, test_loader, criterion, device)scheduler.step()# Save best modelif psnr > best_psnr:best_psnr = psnrtorch.save(model.state_dict(), 'best_model.pth')# Save some test samplesif epoch % 10 == 0:save_samples(model, test_loader, device, epoch)

5.3 實驗結果

我們在三個醫學圖像數據集上評估了我們的方法(WASA)，并與幾種主流方法進行了比較：

方法	PSNR(dB) MRI	SSIM MRI	PSNR(dB) X-ray	SSIM X-ray	PSNR(dB) CT	SSIM CT
Bicubic	28.34	0.812	30.12	0.834	32.45	0.851
SRCNN	30.12	0.845	32.01	0.862	34.78	0.882
EDSR	31.45	0.872	33.56	0.891	36.12	0.901
RCAN	31.89	0.881	34.02	0.899	36.78	0.912
WASA(ours)	32.56	0.892	34.87	0.912	37.45	0.924