利用矩陣相乘手動實現卷積操作

卷積（Convolution）?是信號處理和圖像處理中的一種重要操作，廣泛應用于深度學習（尤其是卷積神經網絡，CNN）中。它的核心思想是通過一個卷積核（Kernel）?或?濾波器（Filter）?對輸入信號或圖像進行掃描，提取局部特征。在信號處理領域，卷積可以看作是兩個函數或信號在某種程度上的“重疊”運算。在圖像處理中，卷積是圖像濾波的核心操作。圖像濾波器，例如邊緣檢測、模糊和銳化都是通過卷積來實現的。

1. 卷積的數學定義

一維離散卷積

給定兩個離散信號?f?和?g，它們的卷積?(f?g) 定義為：

$(f * g)[n] = \sum_{m=-\infty}^{\infty} f[m] \cdot g[n - m]$

二維離散卷積

對于二維信號（如圖像），卷積的定義為：

$(f * g)[m, n] = \sum_{k_1=-\infty}^{\infty} \sum_{k_2=-\infty}^{\infty} f[k_1, k_2] \cdot g[m - k_1, n - k_2]$

2. 卷積的直觀理解

卷積操作可以理解為：

滑動窗口：卷積核在輸入信號或圖像上滑動。
點積操作：在每個位置，卷積核與輸入信號的局部區域進行點積。
特征提取：通過卷積核提取輸入信號的局部特征。

3. 卷積的參數

在深度學習中，卷積操作通常包含以下參數：

輸入（Input）：輸入信號或圖像，形狀為?(batch_size, channels, height, width)。
卷積核（Kernel）：濾波器，形狀為?(out_channels, in_channels, kernel_height, kernel_width)。
步長（Stride）：卷積核滑動的步長，控制輸出的大小。
填充（Padding）：在輸入信號或圖像的邊緣填充值（如 0），控制輸出的大小。
輸出（Output）：卷積操作的結果，形狀為?(batch_size, out_channels, output_height, output_width)。

4. 卷積的輸出大小

卷積操作的輸出大小可以通過以下公式計算：

$\text{output\_height} = \left\lfloor \frac{\text{input\_height} - \text{kernel\_height}+2*\text{padding}}{\text{stride}} \right\rfloor + 1$

其中：

input_size：輸入信號或圖像的大小。
kernel_size：卷積核的大小。
padding：填充大小。
stride：步長。

5.卷積的計算

1.單輸入通道，單個卷積核

輸入圖片的像素值如下：

$\begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{bmatrix}$

卷積核為：

$\begin{bmatrix} 1 & 0 &1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \\ \end{bmatrix}$

計算第一個子區域和卷積核的對應元素乘積之和，如下圖所示：

?Cov_feature[0,0]=1x1+1x0+1x1+0x0+1x1+1x0+0x1+0x0+1x1 =4

接著計算第二個子區域和卷積核的對應元素乘積之和，如下圖所示：

Cov_feature[0,1] =1x1+1x0+0x1+1x0+1x1+1x0+0x1+1x0+1x1=3

……

2.多輸入通道，單個卷積核

若輸入含有多個通道，則對于某個卷積核，分別對每個通道求feature map后將對應位置相加得到最終的feature map，如下圖所示：

3.多個卷積核

6. 卷積的代碼實現

1.簡單卷積的實現（不包含batch_size,channels）：

import  torchdef  matrix_muti_for_cov(x,kernel,stride=1):# kernel.shape ->(h,w)output_h= int((x.shape[0]-kernel.shape[0])/stride) +1   # 計算輸入的高output_w= int((x.shape[1]-kernel.shape[1])/stride) +1   # 計算輸入的寬output =torch.zeros(output_h,output_w) #  初始化為（output_h,output_w）的矩陣for i in range (0,x.shape[0]-kernel.shape[0]+1,stride): # 遍歷高的維度for j in range (0,x.shape[1]-kernel.shape[1]+1,stride): # 遍歷寬的維度area = x[i:i+kernel.shape[0],j:j+kernel.shape[1]] # 獲取卷積核滑過區域output[i,j] =torch.sum(area*kernel)  實現卷積操作return  output

?調用函數，求卷積結果


input =torch.randn(5,5)
kernel =torch.randn(3,3)  
output =matrix_muti_for_cov(input,kernel)
print(output)

?輸出為

tensor([[-2.0837, -1.1043, ?3.2571],
? ? ? ? [-1.1638, ?0.7576, ?3.2776],
? ? ? ? [ 0.3669, ?0.4015, ?0.9808]])

使用torch.nn.functional.conv2d(input,jernel) 來測試：

在conv2d函數中，要求

input.shape(batch_size,in_channels,hight,weight）

kernel.shape(out_channels,in_channels,kernel_hight,kernel_weight）

input =input.reshape((1,1,input.shape[0],input.shape[1]))
kernel =kernel.reshape((1,1,kernel.shape[0],kernel.shape[1]))
cov_out =F.conv2d(input,kernel)
print(cov_out.squeeze(0).squeeze(0))

?輸出為

tensor([[-2.0837, -1.1043, ?3.2571],
? ? ? ? [-1.1638, ?0.7576, ?3.2776],
? ? ? ? [ 0.3669, ?0.4015, ?0.9808]])

cov_out.squeeze(0).squeeze(0)是為了將batch_size維度和channels維度的數據剔出，和上面的output的數據維度相對應。

對上述代碼進行簡單的升級操作

def  matrix_muti_for_cov(x,kernel,stride=1,padding=0):# kernel.shape ->(h,w)output_h= int((x.shape[0]-kernel.shape[0])/stride) +1output_w= int((x.shape[1]-kernel.shape[1])/stride) +1output =torch.zeros(output_h,output_w)area_matrix = torch.zeros(output.numel(),kernel.numel())kernel_matrix =kernel.reshape(kernel.numel(),-1)for i in range (0,x.shape[0]-kernel.shape[0]+1,stride):for j in range (0,x.shape[1]-kernel.shape[1]+1,stride):area = x[i:i+kernel.shape[0],j:j+kernel.shape[1]]area_matrix[i+j] = torch.flatten(area)output_matrix =area_matrix@ kernel_matrixoutput = output_matrix.reshape(output_h, output_w)return  output

2.簡易完整卷積的實現（包含batch_size,channels，stride，padding）：

def  matrix_muti_for_cov2(input,kernel,stride=1,padding=1):# input.size ---> [batch_size,channels,hight,weight]batch,channel,x_h,x_w =input.shape# input.size ---> [out_channels,in_channels,kernel_hight,kernel_weight]channel_out,channels_in,kernel_h,kernel_w =kernel.shape# math.floor() 函數的作用是向下取整，也稱為取底。 它返回小于或等于給定數值的最大整數output_h= (math.floor((x_h+2*padding-kernel_h)/stride) +1)output_w= (math.floor((x_w+2*padding-kernel_w)/stride) +1)output =torch.zeros(batch,channel_out,output_h,output_w)  # 初始化矩陣input_padded = torch.zeros(batch, channel, x_h+2*padding, x_w+2*padding) #  實現padding操作input_padded[:,:,padding:x_h+padding,padding:x_w+padding] =input  # 將input的值賦值給input_padded對應的區域for  b in range(batch):   # 遍歷batch維度for c_out  in  range(channel_out):  # 遍歷out_channel維度for i in range (0,output_h,stride): # 遍歷hight維度for j in range (0,output_w,stride):  # 遍歷 weight維度area = input_padded[b,:,i:i+kernel_h,j:j+kernel_w]output[b,c_out,i,j] =torch.sum(area*kernel[c_out])                   return output

調用函數，測試結果

cov_out =matrix_muti_for_cov2(input,kernel)
# print(cov_out)
cov_out2 =F.conv2d(input,kernel,padding=1)
# print(cov_out2)
if torch.allclose(cov_out, cov_out2, rtol=1e-05, atol=1e-08):print("兩個卷積結果近似相等。") 
else:print("兩個卷積結果不相等。")  print("最大絕對誤差:", torch.max(torch.abs(cov_out - cov_out2)))

?輸出為“ 兩個卷積結果近似相等。”

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/71288.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/71288.shtml
英文地址，請注明出處：http://en.pswp.cn/web/71288.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！