神經網絡 torch.nn---Convolution Layers

torch.nn — PyTorch 2.3 documentation

torch.nn - PyTorch中文文檔 (pytorch-cn.readthedocs.io)

torch.nn和torch.nn.functional的區別

torch.nn是對torch.nn.functional的一個封裝，讓使用torch.nn.functional里面的包的時候更加方便
torch.nn包含了torch.nn.functional，打個比方，torch.nn.functional相當于開車的時候齒輪的運轉，torch.nn相當于把車里的齒輪都封裝好了，為我們提供一個方向盤
如果只是簡單應用，會torch.nn就好了。但要細致了解卷積操作，需要深入了解torch.nn.functional
打開torch.nn.functional的官方文檔，可以看到許多跟卷積相關的操作：torch.nn.functional — PyTorch 2.3 documentation

torch.nn中Convolution Layers 卷積層

一維卷積層 torch.nn.Conv1d
二維卷積層 torch.nn.Conv2d
三維卷積層 torch.nn.Conv3d

一維卷積層 torch.nn.Conv1d

class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

一維卷積層，輸入的尺度是(N, C_in,L)，輸出尺度（ N,C_out,L_out）的計算方式：

說明

bigotimes: 表示相關系數計算
stride: 控制相關系數的計算步長
dilation: 用于控制內核點之間的距離，詳細描述在這里
groups: 控制輸入和輸出之間的連接，?group=1，輸出是所有的輸入的卷積；group=2，此時相當于有并排的兩個卷積層，每個卷積層計算輸入通道的一半，并且產生的輸出是輸出通道的一半，隨后將這兩個輸出連接起來。

Parameters：

in_channels(int) – 輸入信號的通道
out_channels(int) – 卷積產生的通道
kerner_size(int?or?tuple) - 卷積核的尺寸
stride(int?or?tuple,?optional) - 卷積步長
padding (int?or?tuple,?optional)- 輸入的每一條邊補充0的層數
dilation(int?or?tuple, `optional``) – 卷積核元素之間的間距
groups(int,?optional) – 從輸入通道到輸出通道的阻塞連接數
bias(bool,?optional) - 如果bias=True，添加偏置

shape:
輸入: (N,C_in,L_in)
輸出: (N,C_out,L_out)
輸入輸出的計算方式：

變量:
weight(tensor) - 卷積的權重，大小是(out_channels,?in_channels,?kernel_size)
bias(tensor) - 卷積的偏置系數，大小是（out_channel）

二維卷積層

1、torch.nn.functional.conv2d?

torch.nn.functional.conv2d(input,?weight,?bias=None,?stride=1,?padding=0,?dilation=1,?groups=1)

對幾個輸入平面組成的輸入信號應用2D卷積。

參數：

input: 輸入，數據類型為tensor，形狀尺寸規定為：(minibatch, 幾個通道(in_channels), 高, 寬)
weight: 權重。更專業地來說可以叫卷積核，形狀尺寸規定為：(輸出的通道(out_channel),?in_channels/groups(groups一般取1), 高kH, 寬kW)
bias: 偏置。可選偏置張量 (out_channels)?
strids: 步幅。卷積核的步長，可以是單個數字或一個元組 (sh x sw)
padding: 填充。默認為1 -?padding?– 輸入上隱含零填充。可以是單個數字或元組。
默認值：0 -?groups?– 將輸入分成組，in_channels應該被組數除盡

舉例講解參數strids

輸入一個5×5的圖像，其中的數字代表在每個像素中的顏色顯示。卷積核設置為3×3的大小。

strids參數的輸入格式是單個數或者形式為?(sH,sW)?的元組，可以理解成：比如輸入單個數：strids=1，每次卷積核在圖像中向上下或左右移1位；如果輸入strids=(2,3)，那么每次卷積核在圖像中左右移動（橫向移動）時，是移動2位，在圖像中上下移動（縱向移動）時，是移動3位。
本例設置strids=1

第一次移位：

基于上述的假設，在做卷積的過程中，需要將卷積核將圖像的前三行和前三列進行匹配：

在匹配過后，進行卷積計算：對應位相乘然后相加，即

上面的得出的10可以賦值給矩陣，然后作為一個輸出

?之后卷積核可以在圖像中進行一個移位，可以向旁邊走1位或2位，如下圖（向右走2位）。具體走多少位由strids參數決定，比如strids=2，那就是走2位。本例設置stride=1。

第二次移位：

向右移動一位，進行卷積計算：

以此類推，走完整個圖像，最后輸出的矩陣如下圖。這個矩陣是卷積后的輸出。

舉例講解參數padding

padding的作用是在輸入圖像的左右兩邊進行填充，padding的值決定填充的大小有多大，它的輸入形式為一個整數或者一個元組 ( padH, padW )，其中，padH=高，padW=寬。默認padding=0，即不進行填充。

仍輸入上述的5×5的圖像，并設置padding=1，那么輸入圖像將會變成下圖，即圖像的上下左右都會拓展一個像素，然后這些空的地方像素（里面填充的數據）都默認為0。

按上面的順序進行卷積計算，第一次移位時在左上角3×3的位置，卷積計算公式變為：

以此類推，完成后面的卷積計算，并輸出矩陣

程序代碼

import torch
import torch.nn.functional as Finput = torch.tensor([[1, 2, 0, 3, 1],[0, 1, 2, 3, 1],[1, 2, 1, 0, 0],[5, 2, 3, 1, 1],[2, 1, 0, 1, 1]])kernel = torch.tensor([[1, 2, 1],[0, 1, 0],[2, 1, 0]])input = torch.reshape(input, (1, 1, 5, 5))
kernel = torch.reshape(kernel, (1, 1, 3, 3))print(input.shape)
print(kernel.shape)output = F.conv2d(input, kernel, stride=1)
print(output)# Stride=2
output2 = F.conv2d(input, kernel, stride=2)
print(output2)#  padding=1
output3 = F.conv2d(input, kernel, stride=1, padding=1)
print(output3)

運行結果

2、torch.nn.Conv2d

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

Parameters：

in_channels(int) – 輸入信號的通道。輸入圖像的通道數，彩色圖像一般為3（RGB三通道）
out_channels(int) – 卷積產生的通道。產生的輸出的通道數
kerner_size(int?or?tuple) - 卷積核的尺寸。一個數或者元組，定義卷積大小。如kernel_size=3，即定義了一個大小為3×3的卷積核；kernel_size=(1,2)，即定義了一個大小為1×2的卷積核。
stride(int?or?tuple,?optional) - 卷積步長。默認為1，卷積核橫向、縱向的步幅大小
padding(int?or?tuple,?optional) - 默認為0，對圖像邊緣進行填充的范圍
dilation(int?or?tuple,?optional) – 卷積核元素之間的間距。默認為1，定義在卷積過程中，它的核之間的距離。這個我們稱之為空洞卷積，但不常用。
groups(int,?optional) – 從輸入通道到輸出通道的阻塞連接數。默認為1。分組卷積，一般都設置為1，很少有改動
bias(bool,?optional) - 默認為True。偏置，常年設置為True。代表卷積后的結果是否加減一個常數。

?二維卷積層, 輸入的尺度是(N, C_in,H,W)，輸出尺度（N,C_out,H_out,W_out）

關于卷積操作，官方文檔的解釋如下：

圖像輸入輸出尺寸轉化計算公式

參數說明：

N:?圖像的batch_size
C:?圖像的通道數
H:?圖像的高
W:?圖像的寬

計算過程

shape:
input: (N,C_in,H_in,W_in)
output: (N,C_out,H_out,W_out)or(C_out,H_out,W_out)

看論文的時候，有些比如像padding這樣的參數不知道，就可以用這條公式去進行推導

變量:
weight(tensor) - 卷積的權重，大小是(out_channels,?in_channels,kernel_size)
bias(tensor) - 卷積的偏置系數，大小是（out_channel）

參數kernel_size的說明

kernel_size主要是用來設置卷積核大小尺寸的，給定模型一個kernel_size，模型就可以據此生成相應尺寸的卷積核。
卷積核中的參數從圖像數據分布中采樣計算得到的。
卷積核中的參數會通過訓練不斷進行調整。

參數out_channel的說明

如果輸入圖像in_channel=1，并且只有一個卷積核，那么對于卷積后產生的輸出，其out_channel也為1
如果輸入圖像in_channel=2，此時有兩個卷積核，那么在卷積后將會輸出兩個矩陣，把這兩個矩陣當作一個輸出，此時out_channel=2

程序代碼

使用CIFAR中的圖像數據，對Conv2d進行講解

import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import Dataset, DataLoader
from torch.utils.tensorboard import SummaryWriterdataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(),download=True)
dataloader = DataLoader(dataset, batch_size=64)class Tudui(nn.Module):def __init__(self):super(Tudui, self).__init__()self.conv1 = nn.Conv2d(3, 6, 3, stride=1, padding=0)def forward(self, x):x = self.conv1(x)return xtudui = Tudui()
print(tudui)writer = SummaryWriter('./logs')
step = 0
for data in dataloader:imgs, targets = dataoutputs = tudui(imgs)print(imgs.shape) # torch.Size([64, 3, 32, 32])print(outputs.shape) # torch.Size([64, 6, 30, 30])writer.add_images("input", imgs, step)# torch.Size([64, 6, 30, 30])   ->> [64, 3, 32, 32]output = torch.reshape(outputs, [-1, 3, 30, 30])#由于第一個值不知道是多少，所以寫-1，它會根據后面的值去計算writer.add_images("output", output, step)step += 1writer.close()