Yolov8的詳解與實戰-深度學習目標檢測

Yolov8的詳解與實戰-

添加圖片注釋，不超過 140 字（可選）

文章目錄
摘要
模型詳解
C2F模塊
Loss
head部分
模型實戰
訓練COCO數據集
下載數據集
COCO轉yolo格式數據集（適用V4，V5，V6，V7，V8）
配置yolov8環境
訓練
測試
訓練自定義數據集
Labelme數據集

摘要

YOLOv8 是 ultralytics 公司在 2023 年 1月 10 號開源的 YOLOv5 的下一個重大更新版本，目前支持圖像分類、物體檢測和實例分割任務，鑒于Yolov5的良好表現，Yolov8在還沒有開源時就收到了用戶的廣泛關注。yolov8的整體架構如下：

添加圖片注釋，不超過 140 字（可選）

Yolov8的改進之處有以下幾個地方：

Backbone：使用的依舊是CSP的思想，將YOLOv5中的C3模塊被替換成了C2f模塊，實現了進一步的輕量化，同時YOLOv8依舊使用了YOLOv5等架構中使用的SPPF模塊；
PAN-FPN：YOLOv8依舊使用了PAN的思想，不同的是YOLOv8將YOLOv5中PAN-FPN上采樣階段中的卷積結構刪除了，同時也將C3模塊替換為了C2f模塊；
Decoupled-Head：這一點源自YOLOX；分類和回歸兩個任務的head不再共享參數，YoloV8也借鑒了這樣的head設計。
Anchor-Free：YOLOv8拋棄了以往的Anchor-Base，使用了Anchor-Free的思想；
損失函數：YOLOv8使用VFL Loss作為分類損失，使用DFL Loss+CIOU Loss作為分類損失；
樣本匹配：YOLOv8拋棄了以往的IOU匹配或者單邊比例的分配方式，而是使用了Task-Aligned Assigner匹配方式。
yolov8是個模型簇，從小到大包括：yolov8n、yolov8s、yolov8m、yolov8l、yolov8x等。模型參數、運行速度、參數量等詳見下表：

添加圖片注釋，不超過 140 字（可選）

對比yolov5

，如下表：

添加圖片注釋，不超過 140 字（可選）

mAP和參數量都上升了不少，具體的感受還是要親自實踐一番。
這篇文章首先對YoloV8做詳細的講解，然后實現對COCO數據集的訓練和測試，最后，實現自定義數據集的訓練和測試。

希望能幫助到朋友們！

分割的結果

添加圖片注釋，不超過 140 字（可選）

分類的結果

添加圖片注釋，不超過 140 字（可選）

模型詳解

C2F模塊
yolov8將yolov5中的C3模塊換成了C2F模型，我們先了解一下C3模塊，如圖：

添加圖片注釋，不超過 140 字（可選）

C3模塊，其主要是借助CSPNet提取分流的思想，同時結合殘差結構的思想，設計了所謂的C3 Block，這里的CSP主分支梯度模塊為BottleNeck模塊，堆疊的個數由參數n來進行控制，不同的模型，n的個數也不相同。C3的pytorch代碼如下：
class C3(nn.Module):# CSP Bottleneck with 3 convolutionsdef init(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansionsuper().init()c_ = int(c2 * e) # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c1, c_, 1, 1)self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))def forward(self, x):return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
接下來，我們一起學習C2F模塊，先經過一個Conv，然后使用chunk函數將out平均拆分成兩個向量，然后保存到list中，將后半部分輸入到Bottleneck Block里面，Bottleneck Block里面有n個Bottleneck，將每個Bottleneck的輸出都追加list中，有n個，所以拼接之后的out等于0.5?(n+2)。然后經過一個Conv輸出，所以輸出為h×w×c_out。如下圖：

添加圖片注釋，不超過 140 字（可選）

如果還是比較難懂，我將具體的數據代入圖中，得出下圖：

添加圖片注釋，不超過 140 字（可選）

Loss

對于YOLOv8，其分類損失為VFL Loss，其回歸損失為CIOU Loss+DFL的形式，這里Reg_max默認為16。
VFL主要改進是提出了非對稱的加權操作，FL和QFL都是對稱的。而非對稱加權的思想來源于論文PISA，該論文指出首先正負樣本有不平衡問題，即使在正樣本中也存在不等權問題，因為mAP的計算是主正樣本。

添加圖片注釋，不超過 140 字（可選）

q是label，正樣本時候q為bbox和gt的IoU，負樣本時候q=0，當為正樣本時候其實沒有采用FL，而是普通的BCE，只不過多了一個自適應IoU加權，用于突出主樣本。而為負樣本時候就是標準的FL了。可以明顯發現VFL比QFL更加簡單，主要特點是正負樣本非對稱加權、突出正樣本為主樣本。
針對這里的DFL（Distribution Focal Loss），其主要是將框的位置建模成一個 general distribution，讓網絡快速的聚焦于和目標位置距離近的位置的分布。

添加圖片注釋，不超過 140 字（可選）

DFL 能夠讓網絡更快地聚焦于目標 y 附近的值，增大它們的概率；
DFL的含義是以交叉熵的形式去優化與標簽y最接近的一左一右2個位置的概率，從而讓網絡更快的聚焦到目標位置的鄰近區域的分布；也就是說學出來的分布理論上是在真實浮點坐標的附近，并且以線性插值的模式得到距離左右整數坐標的權重。
head部分
相對于YOLOv5，YOLOv8將Head里面C3模塊替換為了C2f，將上采樣之前的1×1卷積去除了，將Backbone不同階段輸出的特征直接送入了上采樣操作。通過下圖對比可以看出差別：

添加圖片注釋，不超過 140 字（可選）

YOLOv8則是使用了Decoupled-Head，同時由于使用了DFL 的思想，因此回歸頭的通道數也變成了4*reg_max的形式：

添加圖片注釋，不超過 140 字（可選）

模型實戰

訓練COCO數據集
本次使用2017版本的COCO數據集作為例子，演示如何使用YoloV8訓練和預測。
下載數據集
Images:
2017 Train images [118K/18GB] ：http://images.cocodataset.org/zips/train2017.zip
2017 Val images [5K/1GB]：http://images.cocodataset.org/zips/val2017.zip
2017 Test images [41K/6GB]：http://images.cocodataset.org/zips/unlabeled2017.zip
Annotations:
2017 annotations_trainval2017 [241MB]：http://images.cocodataset.org/annotations/annotations_trainval2017.zip
COCO轉yolo格式數據集（適用V4，V5，V6，V7，V8）
最初的研究論文中，COCO中有91個對象類別。然而，在2014年的第一次發布中，僅發布了80個標記和分割圖像的對象類別。2014年發布之后，2017年發布了后續版本。詳細的類別如下：
ID
OBJECT (PAPER)
OBJECT (2014 REL.)
OBJECT (2017 REL.)
SUPER CATEGORY
1
person
person
person
person
2
bicycle
bicycle
bicycle
vehicle
3
car
car
car
vehicle
4
motorcycle
motorcycle
motorcycle
vehicle
5
airplane
airplane
airplane
vehicle
6
bus
bus
bus
vehicle
7
train
train
train
vehicle
8
truck
truck
truck
vehicle
9
boat
boat
boat
vehicle
10
trafficlight
traffic light
traffic light
outdoor
11
fire hydrant
fire hydrant
fire hydrant
outdoor
12
street
sign

13
stop sign
stop sign
stop sign
outdoor
14
parking meter
parking meter
parking meter
outdoor
15
bench
bench
bench
outdoor
16
bird
bird
bird
animal
17
cat
cat
cat
animal
18
dog
dog
dog
animal
19
horse
horse
horse
animal
20
sheep
sheep
sheep
animal
21
cow
cow
cow
animal
22
elephant
elephant
elephant
animal
23
bear
bear
bear
animal
24
zebra
zebra
zebra
animal
25
giraffe
giraffe
giraffe
animal
26
hat

accessory
27
backpack
backpack
backpack
accessory
28
umbrella
umbrella
umbrella
accessory
29
shoe

accessory
30
eye glasses

accessory
31
handbag
handbag
handbag
accessory
32
tie
tie
tie
accessory
33
suitcase
suitcase
suitcase
accessory
34
frisbee
frisbee
frisbee
sports
35
skis
skis
skis
sports
36
snowboard
snowboard
snowboard
sports
37
sports ball
sports ball
sports ball
sports
38
kite
kite
kite
sports
39
baseball bat
baseball bat
baseball bat
sports
40
baseball glove
baseball glove
baseball glove
sports
41
skateboard
skateboard
skateboard
sports
42
surfboard
surfboard
surfboard
sports
43
tennis racket
tennis racket
tennis racket
sports
44
bottle
bottle
bottle
kitchen
45
plate

kitchen
46
wine glass
wine glass
wine glass
kitchen
47
cup
cup
cup
kitchen
48
fork
fork
fork
kitchen
49
knife
knife
knife
kitchen
50
spoon
spoon
spoon
kitchen
51
bowl
bowl
bowl
kitchen
52
banana
banana
banana
food
53
apple
apple
apple
food
54
sandwich
sandwich
sandwich
food
55
orange
orange
orange
food
56
broccoli
broccoli
broccoli
food
57
carrot
carrot
carrot
food
58
hot dog
hot dog
hot dog
food
59
pizza
pizza
pizza
food
60
donut
donut
donut
food
61
cake
cake
cake
food
62
chair
chair
chair
furniture
63
couch
couch
couch
furniture
64
potted plant
potted plant
potted plant
furniture
65
bed
bed
bed
furniture
66
mirror

furniture
67
dining table
dining table
dining table
furniture
68
window

furniture
69
desk

furniture
70
toilet
toilet
toilet
furniture
71
door

furniture
72
tv
tv
tv
electronic
73
laptop
laptop
laptop
electronic
74
mouse
mouse
mouse
electronic
75
remote
remote
remote
electronic
76
keyboard
keyboard
keyboard
electronic
77
cell phone
cell phone
cell phone
electronic
78
microwave
microwave
microwave
appliance
79
oven
oven
oven
appliance
80
toaster
toaster
toaster
appliance
81
sink
sink
sink
appliance
82
refrigerator
refrigerator
refrigerator
appliance
83
blender

appliance
84
book
book
book
indoor
85
clock
clock
clock
indoor
86
vase
vase
vase
indoor
87
scissors
scissors
scissors
indoor
88
teddy bear
teddy bear
teddy bear
indoor
89
hair drier
hair drier
hair drier
indoor
90
toothbrush
toothbrush
toothbrush
indoor
91
hair brush

indoor
可以看到，2014年和2017年發布的對象列表是相同的，它們是論文中最初91個對象類別中的80個對象。所以在轉換的時候，要重新對類別做映射，映射函數如下：
def coco91_to_coco80_class(): # converts 80-index (val2014) to 91-index (paper)# https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/# a = np.loadtxt(‘data/coco.names’, dtype=‘str’, delimiter=‘\n’)# b = np.loadtxt(‘data/coco_paper.names’, dtype=‘str’, delimiter=‘\n’)# x1 = [list(a[i] == b).index(True) + 1 for i in range(80)] # darknet to coco# x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)] # coco to darknetx = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,None, 73, 74, 75, 76, 77, 78, 79, None]return x
接下來，開始格式轉換，工程的目錄如下：

添加圖片注釋，不超過 140 字（可選）

coco：存放解壓后的數據集。
-out：保存輸出結果。
-coco2yolo.py：轉換腳本。
轉換代碼如下：
import json
import glob
import os
import shutil
from pathlib import Path
import numpy as np
from tqdm import tqdmdef make_folders(path=‘…/out/’):# Create foldersif os.path.exists(path):shutil.rmtree(path) # delete output folderos.makedirs(path) # make new output folderos.makedirs(path + os.sep + ‘labels’) # make new labels folderos.makedirs(path + os.sep + ‘images’) # make new labels folderreturn pathdef convert_coco_json(json_dir=‘./coco/annotations_trainval2017/annotations/’):jsons = glob.glob(json_dir + ‘*.json’)coco80 = coco91_to_coco80_class()# Import jsonfor json_file in sorted(jsons):fn = ‘out/labels/%s/’ % Path(json_file).stem.replace(‘instances_’, ‘’) # folder namefn_images = ‘out/images/%s/’ % Path(json_file).stem.replace(‘instances_’, ‘’) # folder nameos.makedirs(fn,exist_ok=True)os.makedirs(fn_images,exist_ok=True)with open(json_file) as f:data = json.load(f)print(fn)# Create image dictimages = {‘%g’ % x[‘id’]: x for x in data[‘images’]}# Write labels filefor x in tqdm(data[‘annotations’], desc=‘Annotations %s’ % json_file):if x[‘iscrowd’]:continueimg = images[‘%g’ % x[‘image_id’]]h, w, f = img[‘height’], img[‘width’], img[‘file_name’]file_path=‘coco/’+fn.split(‘/’)[-2]+“/”+f# The Labelbox bounding box format is [top left x, top left y, width, height]box = np.array(x[‘bbox’], dtype=np.float64)box[:2] += box[2:] / 2 # xy top-left corner to centerbox[[0, 2]] /= w # normalize xbox[[1, 3]] /= h # normalize yif (box[2] > 0.) and (box[3] > 0.): # if w > 0 and h > 0with open(fn + Path(f).stem + ‘.txt’, ‘a’) as file:file.write(‘%g %.6f %.6f %.6f %.6f\n’ % (coco80[x[‘category_id’] - 1], *box))file_path_t=fn_images+fprint(file_path,file_path_t)shutil.copy(file_path,file_path_t)def coco91_to_coco80_class(): # converts 80-index (val2014) to 91-index (paper)# https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/# a = np.loadtxt(‘data/coco.names’, dtype=‘str’, delimiter=‘\n’)# b = np.loadtxt(‘data/coco_paper.names’, dtype=‘str’, delimiter=‘\n’)# x1 = [list(a[i] == b).index(True) + 1 for i in range(80)] # darknet to coco# x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)] # coco to darknetx = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,None, 73, 74, 75, 76, 77, 78, 79, None]return xconvert_coco_json()
開始運行：

添加圖片注釋，不超過 140 字（可選）

轉換完成后，驗證轉換的結果：
import cv2
import osdef draw_box_in_single_image(image_path, txt_path):# 讀取圖像image = cv2.imread(image_path)# 讀取txt文件信息def read_list(txt_path):pos = []with open(txt_path, ‘r’) as file_to_read:while True:lines = file_to_read.readline() # 整行讀取數據if not lines:break# 將整行數據分割處理，如果分割符是空格，括號里就不用傳入參數，如果是逗號，則傳入‘，‘字符。p_tmp = [float(i) for i in lines.split(’ ‘)]pos.append(p_tmp) # 添加新讀取的數據# Efield.append(E_tmp)passreturn pos# txt轉換為boxdef convert(size, box):xmin = (box[1]-box[3]/2.)*size[1]xmax = (box[1]+box[3]/2.)*size[1]ymin = (box[2]-box[4]/2.)*size[0]ymax = (box[2]+box[4]/2.)*size[0]box = (int(xmin), int(ymin), int(xmax), int(ymax))return boxpos = read_list(txt_path)print(pos)tl = int((image.shape[0]+image.shape[1])/2)lf = max(tl-1,1)for i in range(len(pos)):label = str(int(pos[i][0]))print(‘label is ‘+label)box = convert(image.shape, pos[i])image = cv2.rectangle(image,(box[0], box[1]),(box[2],box[3]),(0,0,255),2)cv2.putText(image,label,(box[0],box[1]-2), 0, 1, [0,0,255], thickness=2, lineType=cv2.LINE_AA)passif pos:cv2.imwrite(’./Data/see_images/{}.png’.format(image_path.split(’\‘)[-1][:-4]), image)else:print(‘None’)img_folder = “./out/images/val2017”
img_list = os.listdir(img_folder)
img_list.sort()label_folder = “./out/labels/val2017”
label_list = os.listdir(label_folder)
label_list.sort()
if not os.path.exists(’./Data/see_images’):os.makedirs(‘./Data/see_images’)
for i in range(len(img_list)):image_path = img_folder + “\” + img_list[i]txt_path = label_folder + “\” + label_list[i]draw_box_in_single_image(image_path, txt_path)
結果展示：

添加圖片注釋，不超過 140 字（可選）