在 PyTorch 中,卷積層主要由?torch.nn.Conv1d
、torch.nn.Conv2d
?和?torch.nn.Conv3d
?實現,分別對應一維、二維和三維卷積操作。以下是詳細說明:
1. 二維卷積 (Conv2d) - 最常用
import torch.nn as nn# 基本參數
conv = nn.Conv2d(in_channels=3, # 輸入通道數 (如RGB圖像為3)out_channels=16, # 輸出通道數/卷積核數量kernel_size=3, # 卷積核大小 (可以是int或tuple, 如(3,3))stride=1, # 步長 (默認1)padding=1, # 邊界填充 (默認0)dilation=1, # 空洞卷積參數 (默認1)groups=1, # 分組卷積參數 (默認1)bias=True # 是否使用偏置 (默認True)
)
計算輸出尺寸:
比如:高度
?
2. 使用示例?
import torch# 輸入張量 (batch_size=4, 通道=3, 高=32, 寬=32)
x = torch.randn(4, 3, 32, 32)# 卷積層
conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
output = conv(x)
print(output.shape) # torch.Size([4, 16, 32, 32])
3. 特殊卷積類型
(1) 空洞卷積 (Dilated Convolution)
nn.Conv2d(3, 16, kernel_size=3, dilation=2) # 擴大感受野
(2) 分組卷積 (Grouped Convolution)
nn.Conv2d(16, 32, kernel_size=3, groups=4) # 將輸入/輸出通道分為4組
(3) 深度可分離卷積 (Depthwise Separable)
# 等價于 groups=in_channels
depthwise = nn.Conv2d(16, 16, kernel_size=3, groups=16)
pointwise = nn.Conv2d(16, 32, kernel_size=1) # 1x1卷積
4. 一維和三維卷積
Conv1d (時序數據/文本)
conv1d = nn.Conv1d(in_channels=256, out_channels=100, kernel_size=3)
# 輸入形狀: (batch, channels, sequence_length)
Conv3d (視頻/體積數據)
conv3d = nn.Conv3d(1, 32, kernel_size=(3,3,3))
# 輸入形狀: (batch, channels, depth, height, width)
5. 轉置卷積 (反卷積)
nn.ConvTranspose2d(16, 8, kernel_size=2, stride=2) # 常用于上采樣
6. 初始化權重
# 常用初始化方法
nn.init.kaiming_normal_(conv.weight, mode='fan_out')
nn.init.constant_(conv.bias, 0.1)
7. 可視化卷積核
import matplotlib.pyplot as pltweights = conv.weight.detach().cpu()
plt.figure(figsize=(10,5))
for i in range(16):plt.subplot(4,4,i+1)plt.imshow(weights[i,0], cmap='gray')
plt.show()
8. 總結:
-
卷積核參數共享,大大減少參數量
-
padding='same'
?可保持輸入輸出尺寸相同 (PyTorch 1.9+) -
通常配合?
BatchNorm
?和?ReLU
?使用 -
使用?
print(conv)
?可查看層結構
實際應用中,卷積層常與池化層交替使用構建CNN架構,如:
self.conv_block = nn.Sequential(nn.Conv2d(3, 32, 3, padding=1),nn.BatchNorm2d(32),nn.ReLU(),nn.MaxPool2d(2)
)