yolov5基礎--yolov5源碼閱讀(common.py)

在這里插入圖片描述

🍨 本文為🔗365天深度學習訓練營中的學習記錄博客
🍖 原作者：K同學啊

博主簡介：努力學習的22級本科生一枚 🌟?；探索AI算法，C++，go語言的世界；在迷茫中尋找光芒?🌸
博客主頁：羊小豬~~-CSDN博客
內容簡介：common.py詳解，common.py文件講述的是yolo網絡具體實現 🙏 🙏 🙏
往期–>yolov5網絡結構講解：深度學習基礎–yolov5網絡結構簡介，C3模塊構建_yolov5的c3模塊結構-CSDN博客

文章目錄

1、導入庫
2、基本組件
- autopad
- Conv
- Focus
- Bottlenck
- BottlenckCSP
- C3
- SPP
- ConCat
- Contract、Expand

📖 前言:

common.py文件是實現YOLO算法中各個模塊的地方，如果我們需要修改某一模塊(例如C3)，那么就需要修改這個文件中對應模塊的的定義。*.yaml文件是yolo網絡結構的搭建，而common.py是每個網絡模塊具體實現的地方。

YOLOv5網絡結構：

在這里插入圖片描述

1、導入庫

這部分是導入需要用到的包和庫。

import ast
import contextlib
import json
import math
import platform
import warnings
import zipfile
from collections import OrderedDict, namedtuple
from copy import copy
from pathlib import Path
from urllib.parse import urlparseimport cv2
import numpy as np
import pandas as pd
import requests
import torch
import torch.nn as nn
from PIL import Image
from torch.cuda import amp

2、基本組件

autopad

這個模塊是自動計算padding值的。

def autopad(k, p=None, d=1):"""Pads kernel to 'same' output shape, adjusting for optional dilation; returns padding size.`k`: kernel, `p`: padding, `d`: dilation."""if d > 1:k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-sizeif p is None:p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-padreturn p

Conv

這個模塊是整個網絡中最基礎的組件，由卷積層 + BN層 + SiLU激活函數組成。

在這里插入圖片描述

class Conv(nn.Module):# Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)default_act = nn.SiLU()  # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initializes a standard convolution layer with optional batch normalization and activation."""'''c1:輸入的channel值c2:輸出的channel值k:卷積的kernel sizeparamss:卷積的strideparams p:卷積的padding一般是None 可以通過autopad自行計算需要pad的paddingparams g:卷積的groups數=1就是普通的卷積 >1就是深度可分離卷積params 		act:激活函數類型True就是SiLU()/Swish，False就是不使用激活函數類型是nn.Module就使用傳進來的激活函數類型'''super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Applies a convolution followed by batch normalization and an activation function to the input tensor `x`."""return self.act(self.bn(self.conv(x)))def forward_fuse(self, x):"""Applies a fused convolution and activation function to the input tensor `x`."""return self.act(self.conv(x))

Focus

這個模塊是作者設計出來的，目的是為了減少浮點數和提高精度。

📘 本質：將圖片進行切片，將圖片的寬、高進行切分，然后聚合到通道中，即實現了圖片縮放、也提高了特征提前取。

實現：分組卷積。

在這里插入圖片描述

class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):"""Initializes Focus module to concentrate width-height info into channel space with configurable convolutionparameters."""''' 在yolo.py的parse_model函數中被調用理論：從高分辨率圖像中，周期性的抽出像素點重構到低分辨率圖像中，即將圖像相鄰的四個位置進行堆疊，聚焦wh維度信息到c通道，提高每個點感受野，并減少原始信息的丟失，該模塊的設計主要是減少計算量加快速度。Focus wh information into c-space 把寬度w和高度h的信息整合到c空間中先做4個slice 再concat 最后再做Convslice后 (b,c1,w,h) -> 分成4個slice 每個slice(b,c1,w/2,h/2)concat(dim=1)后 4個slice(b,c1,w/2,h/2)) -> (b,4c1,w/2,h/2)conv后 (b,4c1,w/2,h/2) -> (b,c2,w/2,h/2):params c1: slice后的channel:params c2: Focus最終輸出的channel:params k: 最后卷積的kernel:params s: 最后卷積的stride:params p: 最后卷積的padding:params g: 最后卷積的分組情況  =1普通卷積  >1深度可分離卷積:params act: bool激活函數類型  默認True:SiLU()/Swish  False:不用激活函數'''super().__init__()self.conv = Conv(c1 * 4, c2, k, s, p, g, act=act)# self.contract = Contract(gain=2)def forward(self, x):"""Processes input through Focus mechanism, reshaping (b,c,w,h) to (b,4c,w/2,h/2) then applies convolution."""return self.conv(torch.cat((x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]), 1))# return self.conv(self.contract(x))

Bottlenck

模型結構：

在這里插入圖片描述

作用：采用殘差連接，解決梯度消失問題。

class Bottleneck(nn.Module):# Standard bottleneckdef __init__(self, c1, c2, shortcut=True, g=1, e=0.5):"""在BottleneckCSP和yolo.py的parse_model中調用Standard bottleneck  Conv+Conv+shortcut:params c1: 第一個卷積的輸入channel:params c2: 第二個卷積的輸出channel:params shortcut: bool 是否有shortcut連接 默認是True:params g: 卷積分組的個數  =1就是普通卷積  >1就是深度可分離卷積:params e: expansion ratio  e*c2就是第一個卷積的輸出channel=第二個卷積的輸入channel"""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_, c2, 3, 1, g=g)self.add = shortcut and c1 == c2def forward(self, x):"""Processes input through two convolutions, optionally adds shortcut if channel dimensions match; input is atensor."""return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

BottlenckCSP

這個模塊有Bottleneck模塊和CSP結構組成。這個模塊和C3等效，但是一般YOLO用的是C3結構。

網絡結構：

在這里插入圖片描述

class BottleneckCSP(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):"""在C3模塊和yolo.py的parse_model模塊調用CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks:params c1: 整個BottleneckCSP的輸入channel:params c2: 整個BottleneckCSP的輸出channel:params n: 有n個Bottleneck:params shortcut: bool Bottleneck中是否有shortcut，默認True:params g: Bottleneck中的3x3卷積類型  =1普通卷積  >1深度可分離卷積:params e: expansion ratio c2xe=中間其他所有層的卷積核個數/中間所有層的輸入輸出channel數"""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)self.cv4 = Conv(2 * c_, c2, 1, 1)self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)self.act = nn.SiLU()self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))def forward(self, x):"""Performs forward pass by applying layers, activation, and concatenation on input x, returning feature-enhanced output."""y1 = self.cv3(self.m(self.cv1(x)))y2 = self.cv2(x)return self.cv4(self.act(self.bn(torch.cat((y1, y2), 1))))

C3

這個是BottlenckCSP的簡化版，常用，作用是緩解梯度消失、提取特征。

👓 網絡結構：

在這里插入圖片描述

class C3(nn.Module):# CSP Bottleneck with 3 convolutionsdef __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):"""在C3TR模塊和yolo.py的parse_model模塊調用CSP Bottleneck with 3 convolutions:params c1: 整個BottleneckCSP的輸入channel:params c2: 整個BottleneckCSP的輸出channel:params n: 有n個Bottleneck:params shortcut: bool Bottleneck中是否有shortcut，默認True:params g: Bottleneck中的3x3卷積類型  =1普通卷積  >1深度可分離卷積:params e: expansion ratio c2xe=中間其他所有層的卷積核個數/中間所有層的輸入輸出channel數"""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c1, c_, 1, 1)self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))def forward(self, x):"""Performs forward propagation using concatenated outputs from two convolutions and a Bottleneck sequence."""return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

SPP

空間金字塔池化(Spatial Pyramid Pooling，SPP)是目標檢測算法中對高層特征進行多尺度池化以增加感受野的重要措施之一。

經典的空間金字塔池化模塊首先將輸入的卷積特征分成不同的尺寸，然后每個尺寸提取固定維度的特征，最后將這些特征拼接成個固定的維度

如圖所示:

輸入的卷積特征圖的大小為(w,h)，第一層空間金字塔采用 4x4 的刻度對特征圖進行劃分，其將輸入的特征圖分成了16個塊，每塊的大小為(w/4，h/4);
第二層空間金字塔采用2x2刻度對特征圖進行劃分，其將特征圖分為4個快每塊大小為(w/2,h/2);
第三層空間金字塔將整張特征圖作為一塊，進行特征提取操作，最終的特征向量為 21=16+4+1 維

在這里插入圖片描述

SPP模塊：

在這里插入圖片描述

class SPP(nn.Module):# Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729def __init__(self, c1, c2, k=(5, 9, 13)):"""Initializes SPP layer with Spatial Pyramid Pooling, ref: https://arxiv.org/abs/1406.4729, args: c1 (input channels), c2 (output channels), k (kernel sizes)."""super().__init__()c_ = c1 // 2  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])def forward(self, x):"""Applies convolution and max pooling layers to the input tensor `x`, concatenates results, and returns outputtensor."""x = self.cv1(x)with warnings.catch_warnings():warnings.simplefilter("ignore")  # suppress torch 1.9.0 max_pool2d() warningreturn self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

ConCat

這個模塊作用是：對兩個維度進行合并，合并成一個新的特征圖(一般是通道進行合并).

class Concat(nn.Module):# Concatenate a list of tensors along dimensiondef __init__(self, dimension=1):"""在yolo.py的parse_model模塊調用Concatenate a list of tensors along dimension:params dimension: 沿著哪個維度進行concat"""super().__init__()self.d = dimensiondef forward(self, x):"""Concatenates a list of tensors along a specified dimension; `x` is a list of tensors, `dimension` is anint."""return torch.cat(x, self.d)

Contract、Expand

這兩個函數用于改變 feature map 維度。

Contract 函數改變輸入特征的shape，將feature map 的 w和h 維度(縮小)的數據收縮到 channel 維度上(放大)。如:x(1,64,80,80) to x(1,256,40,40)
Expand 函數也是改變輸入特征的 shape，不過與 Contract 的相反，是將 channel 維度(變小)的數據擴展到W和H維度(變大)。如:x(1,64，80,80)to x(1,16,160,160)

class Contract(nn.Module):# Contract width-height into channels, i.e. x(1,64,80,80) to x(1,256,40,40)def __init__(self, gain=2):"""Initializes a layer to contract spatial dimensions (width-height) into channels, e.g., input shape(1,64,80,80) to (1,256,40,40)."""super().__init__()self.gain = gaindef forward(self, x):"""Processes input tensor to expand channel dimensions by contracting spatial dimensions, yielding output shape`(b, c*s*s, h//s, w//s)`."""b, c, h, w = x.size()  # assert (h / s == 0) and (W / s == 0), 'Indivisible gain's = self.gainx = x.view(b, c, h // s, s, w // s, s)  # x(1,64,40,2,40,2)x = x.permute(0, 3, 5, 1, 2, 4).contiguous()  # x(1,2,2,64,40,40)return x.view(b, c * s * s, h // s, w // s)  # x(1,256,40,40)class Expand(nn.Module):# Expand channels into width-height, i.e. x(1,64,80,80) to x(1,16,160,160)def __init__(self, gain=2):"""Initializes the Expand module to increase spatial dimensions by redistributing channels, with an optional gainfactor.Example: x(1,64,80,80) to x(1,16,160,160)."""super().__init__()self.gain = gaindef forward(self, x):"""Processes input tensor x to expand spatial dimensions by redistributing channels, requiring C / gain^2 ==0."""b, c, h, w = x.size()  # assert C / s ** 2 == 0, 'Indivisible gain's = self.gainx = x.view(b, s, s, c // s**2, h, w)  # x(1,2,2,16,80,80)x = x.permute(0, 3, 4, 1, 5, 2).contiguous()  # x(1,16,80,2,80,2)return x.view(b, c // s**2, h * s, w * s)  # x(1,16,160,160)