🍨 本文為🔗365天深度學習訓練營 中的學習記錄博客
🍖 原作者:K同學啊
我的環境
語言環境:Python3.8·
編譯器:Jupyter Lab
深度學習環境:Pytorchtorch1.12.1+cu113
torchvision0.13.1+cu113
一、準備工作
二、導入數據
三、劃分數據集
四、搭建網絡
import torch
import torch.nn as nn
import torchsummary as summary
import torchvision.models as modelsdef identity_block(input_tensor, kernel_size, filters, stage, block):"""構建殘差網絡的恒等映射塊Args:input_tensor: 輸入張量kernel_size: 卷積核大小filters: [f1, f2, f3] 形式的過濾器數量列表stage: 階段編號block: 塊編號"""filters1, filters2, filters3 = filtersname_base = f'{stage}{block}_identity_block_'# 第一個 1x1 卷積層x = nn.Conv2d(input_tensor.size(1), filters1, 1, bias=False)(input_tensor)x = nn.BatchNorm2d(filters1)(x)x = nn.ReLU(inplace=True)(x)# 3x3 卷積層x = nn.Conv2d(filters1, filters2, kernel_size, padding=kernel_size//2, bias=False)(x)x = nn.BatchNorm2d(filters2)(x)x = nn.ReLU(inplace=True)(x)# 第二個 1x1 卷積層x = nn.Conv2d(filters2, filters3, 1, bias=False)(x)x = nn.BatchNorm2d(filters3)(x)# 添加跳躍連接x = x + input_tensorx = nn.ReLU(inplace=True)(x)return xdef conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2,2)):"""構建殘差網絡的卷積塊Args:input_tensor: 輸入張量kernel_size: 卷積核大小filters: [f1, f2, f3] 形式的過濾器數量列表stage: 階段編號block: 塊編號strides: 步長元組"""filters1, filters2, filters3 = filtersname_base = f'{stage}{block}_conv_block_'# 主路徑x = nn.Conv2d(input_tensor.size(1), filters1, 1, stride=strides, bias=False)(input_tensor)x = nn.BatchNorm2d(filters1)(x)x = nn.ReLU(inplace=True)(x)x = nn.Conv2d(filters1, filters2, kernel_size, padding=kernel_size//2, bias=False)(x)x = nn.BatchNorm2d(filters2)(x)x = nn.ReLU(inplace=True)(x)x = nn.Conv2d(filters2, filters3, 1, bias=False)(x)x = nn.BatchNorm2d(filters3)(x)# shortcut 路徑shortcut = nn.Conv2d(input_tensor.size(1), filters3, 1, stride=strides, bias=False)(input_tensor)shortcut = nn.BatchNorm2d(filters3)(shortcut)# 添加跳躍連接x = x + shortcutx = nn.ReLU(inplace=True)(x)return xdef ResNet50(input_shape=[224,224,3], num_classes=1000):"""構建 ResNet50 模型Args:input_shape: 輸入圖像的形狀 [H, W, C]num_classes: 分類類別數"""# 輸入層inputs = torch.randn(1, input_shape[2], input_shape[0], input_shape[1])# 初始卷積塊 - 修改 ZeroPadding2d 為 pad 操作x = nn.functional.pad(inputs, (3, 3, 3, 3)) # 替換 ZeroPadding2dx = nn.Conv2d(input_shape[2], 64, 7, stride=2, bias=False)(x)x = nn.BatchNorm2d(64)(x)x = nn.ReLU(inplace=True)(x)x = nn.MaxPool2d(3, stride=2, padding=1)(x)# Stage 2x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1,1))x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')# Stage 3x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')# Stage 4x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')for block in ['b', 'c', 'd', 'e', 'f']:x = identity_block(x, 3, [256, 256, 1024], stage=4, block=block)# Stage 5x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')# 分類層x = nn.AdaptiveAvgPool2d((1, 1))(x)x = torch.flatten(x, 1)x = nn.Linear(2048, num_classes)(x)# 修改模型創建和前向傳播的方式class ResNet(nn.Module):def __init__(self):super(ResNet, self).__init__()# 在這里定義所有層def forward(self, x):# 定義前向傳播return xmodel = ResNet()# 移除 load_weights,改用 PyTorch 的加載方式model.load_state_dict(torch.load("resnet50_pretrained.pth"))return modelmodel = models.resnet50().to(device)
model
ResNet((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(layer1): Sequential((0): Bottleneck((conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer2): Sequential((0): Bottleneck((conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(3): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer3): Sequential((0): Bottleneck((conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(3): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(4): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(5): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer4): Sequential((0): Bottleneck((conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
五、編寫訓練函數和測試函數
六、設置超參數
七、訓練
八、可視化
八、總結
ResNet是由微軟研究院提出的,應該是在2015年的ImageNet競賽中取得了很好的成績。殘差塊是它的核心概念,對吧?殘差塊通過跳躍連接(shortcut connections)將輸入直接傳遞到后面的層,這樣可以讓網絡更容易訓練深層結構。那ResNet-50中的50層具體是怎么組成的呢?
ResNet有不同的版本,比如ResNet-18、34、50、101、152,數字代表層數。50層的應該比更小的版本更深,結構更復雜。那ResNet-50的結構具體是怎樣的?可能由多個殘差塊堆疊而成,每個殘差塊里面有幾個卷積層?
對了,ResNet中的殘差塊有兩種類型,一種是基礎的殘差塊,用于較淺的網絡如ResNet-34,另一種是瓶頸結構(bottleneck),可能用于更深的網絡如ResNet-50。瓶頸結構應該是指通過1x1的卷積層先減少通道數,再做3x3的卷積,然后再用1x1恢復通道數,這樣可以減少計算量,同時保持網絡深度。
所以ResNet-50可能由多個這樣的瓶頸塊組成。那每個階段的塊數應該是怎樣的?比如,可能有四個階段(stage),每個階段包含不同數量的殘差塊。比如第一個階段可能處理較大的特征圖,之后每個階段通過步長為2的卷積進行下采樣,特征圖尺寸減半,通道數增加。
比如,初始的層可能是一個7x7的卷積層,接著是池化層,然后是四個階段的殘差塊。每個階段的塊數目可能是3,4,6,3,這樣加起來總層數可能達到50層?需要計算一下每個塊貢獻的層數。瓶頸塊每個塊包含三個卷積層:1x1,3x3,1x1。所以每個塊有3層,四個階段的塊數如果是3,4,6,3,總塊數是3+4+6+3=16個塊,每個塊3層,就是16 * 3=48層,加上前面的7x7卷積層和池化層,還有最后的全連接層,可能總數接近50層?所以ResNet-50的結構大致如此。
另外,跳躍連接的結構可能有不同的處理方式,當輸入和輸出的通道數或尺寸不一致時,可能用1x1的卷積來調整通道數或進行下采樣。例如,在進入每個階段的第一個殘差塊時,可能需要調整shortcut連接的通道數,這時會使用一個1x1卷積,步長為2來減小尺寸并調整通道數。
那ResNet-50的創新點主要是殘差學習和瓶頸結構,這樣可以讓網絡更深而不出現梯度消失的問題。傳統深層網絡訓練時,隨著層數增加,準確率會飽和甚至下降,而殘差網絡通過引入跳躍連接,使得網絡更容易學習恒等映射,從而讓訓練更深的網絡變得可行。