【EdgeYOLO】《EdgeYOLO: An Edge-Real-Time Object Detector》

在這里插入圖片描述

Liu S, Zha J, Sun J, et al. EdgeYOLO: An edge-real-time object detector[C]//2023 42nd Chinese Control Conference (CCC). IEEE, 2023: 7507-7512.

CCC-2023

源碼：https://github.com/LSH9832/edgeyolo

論文：https://arxiv.org/pdf/2302.07483

文章目錄

1、Background and Motivation
2、Related Work
3、Advantages / Contributions
4、Method
- 4.1、Enhanced-Mosaic & Mixup
- 4.2、Lite-Decoupled Head
- 4.3、Staged Loss Function
5、Experiments
- 5.1、Datasets and Metrics
- 5.2、Results & Comparison
- 5.3、Ablation Study
- 5.4、Tricks for Edge Computing Devices
6、Conclusion（own） / Future work

1、Background and Motivation

邊緣計算設備的需求增長
現有物體檢測器的局限性（傳統的兩階段物體檢測器（如R-CNN系列）雖然在精度上表現較好，但由于其復雜的結構和較高的計算需求，難以在邊緣設備上實現實時運行。而一些輕量級的一階段檢測器（如MobileNet和ShuffleNet）雖然能在邊緣設備上運行，但往往以犧牲精度為代價。）
YOLO系列算法的發展（隨著YOLO系列版本的更新，雖然精度不斷提高，但在邊緣設備上的實時性能卻難以保證）
小物體檢測的挑戰
在設計和評估物體檢測器時，考慮整個檢測任務的完整性，包括預處理、模型推理和后處理時間，以確保在邊緣設備上實現真正的實時性能。

This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework, which can be implemented in real time on edge computing platforms

2、Related Work

Anchor-free Object Detector
- anchor-point-based（本文）
- keypoint-based
Data Augmentation
- geometric augmentation
- photometric augmentation（eg HSV & brightness adjustment）
Model Reduction
- lossy reduction（有損壓縮，builds smaller networks）
- lossless reduction（無損壓縮，eg re-parameterizing techniques）
Decoupled Regression
- different tasks use the same convolution kernel if they are closely related. However, relations between the object’s location, confidence and category are not close enough in numerical logic
- 優點，accelerate the loss convergence
- 缺點， brings extra inference costs.
Small Object Detecting Optimization
- 小目標信息有限
- small objects always account for a less proportion of loss in total loss while training
- 解決方式：（1）replication augmentation，（2）zoomed（指的是大目標縮小成小目標，提高了小目標的占比） and spliced，（3）Loss function
- 解決方式（1）的缺點：scale mismatch and background mismatch，本文作者探索的是（2）（3）

3、Advantages / Contributions

anchor-free object detector is designed——EdgeYOLO
a more powerful data augmentation method is proposed（ensures the quantity and validity of training data）
設計了輕量級的解耦頭結構，Structures that can be re-parameterized are used（減少推理時間）
A loss function is designed to improve the precision on small objects.
在公開數據集上取得了優異性能
開源了代碼和模型權重
多進程/多線程計算架構等優化技巧，進一步提高了EdgeYOLO在邊緣設備上的實時性能。

4、Method

4.1、Enhanced-Mosaic & Mixup

在這里插入圖片描述
還是 mosaic 和 mixup 的混搭，作者 mosaic 的時候做了個分組，然后 mixup，group = 2（the group number can be set according to the richness of the average number of labels in a single picture in the dataset.）

看論文的描述沒有 get 到作者的意思，舉得例子也僅僅是圖片中數量上的差異導致的區別

在這里插入圖片描述

是提高了 mosaic 的圖片數量嗎？比如原來 4 張，現在 8 張？

4.2、Lite-Decoupled Head

在這里插入圖片描述

基于 FCOS 的decouple head 進行了輕量化改進，引入了 re-parameterization 技術（推理的時候部分結構合并到一起）和 implicit konwledge 技術

With the method of re-parameterizing, implicit representation layers are integrated into convolutional layers for lower inference costs.

implicit konwledge 出自

Wang C Y, Yeh I H, Liao H Y M. You only learn one representation: Unified network for multiple tasks[J]. arXiv preprint arXiv:2105.04206, 2021.

yolov7 中也采用了這個技術

【YOLOv7】《YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors》

4.3、Staged Loss Function

整體 loss 結構， $L_{\Delta}$ 是 regulation loss
在這里插入圖片描述

loss 分為三個階段，每個階段不一致

第一階段

gIOU loss for IOU loss, Balanced Cross Entropy loss for classification loss and object loss, regulation loss 被設置為 0

第二階段

at the last few data-augmentation-enabled epochs

分類和目標損失采用的是 Hybrid-Random Loss，應該是作者原創，沒有看到列出的參考文獻

在這里插入圖片描述

基于交叉熵損失的改進

第三階段

close data augmentation

set L1 loss as our regulation loss, and replace gIOU loss by cIOU loss

5、Experiments

訓練時網絡配置參數

默認參數

# models & weights------------------------------------------------------------------------------------------------------
model_cfg: "params/model/edgeyolo.yaml"              # model structure config file
weights: "output/train/edgeyolo_coco/last.pth"       # contains model_cfg, set null or a no-exist filename if not use it
use_cfg: false                                       # force using model_cfg instead of cfg in weights to build model# output----------------------------------------------------------------------------------------------------------------
output_dir: "output/train/edgeyolo_coco"             # all train output file will save in this dir
save_checkpoint_for_each_epoch: true                 # save models for each epoch (epoch_xxx.pth, not only best/last.pth)
log_file: "log.txt"                                  # log file (in output_dir)# dataset & dataloader--------------------------------------------------------------------------------------------------
dataset_cfg: "params/dataset/coco.yaml"              # dataset config
batch_size_per_gpu: 8                                # batch size for each GPU
loader_num_workers: 4                                # number data loader workers for each GPU
num_threads: 1                                       # pytorch threads number for each GPU# device & data type----------------------------------------------------------------------------------------------------
device: [0, 1, 2, 3]                                 # training device list
fp16: false                                          # train with fp16 precision
cudnn_benchmark: false                               # it's useful when multiscale_range is set zero# train hyper-params----------------------------------------------------------------------------------------------------
optimizer: "SGD"                                     # or Adam
max_epoch: 300                                       # or 400
close_mosaic_epochs: 15                              # close data augmentation at last several epochs# learning rate---------------------------------------------------------------------------------------------------------
lr_per_img: 0.00015625                               # total_lr = lr_per_img * batch_size_per_gpu * len(devices)
warmup_epochs: 5                                     # warm-up epochs at the beginning of training
warmup_lr_ratio: 0.0                                 # warm-up learning rate start from value warmup_lr_ratio * total_lr
final_lr_ratio: 0.05                                 # final_lr_per_img = final_lr_ratio * lr_per_img# training & dataset augmentation---------------------------------------------------------------------------------------
#      [cls_loss, conf_loss, iou_loss]
loss_use: ["bce", "bce", "giou"]  # bce: BCE loss. bcf: Balanced Focal loss. hyb: HR loss, iou, c/g/s iou is available
input_size: [640, 640]            # image input size for model
multiscale_range: 5               # real_input_size = input_size + randint(-multiscale_range, multiscale_range) * 32
weight_decay: 0.0005              # optimizer weight decay
momentum: 0.9                     # optimizer momentum
enhance_mosaic: true              # use enhanced mosaic method
use_ema: true                     # use EMA method
enable_mixup: true                # use mixup
mixup_scale: [0.5, 1.5]           # mixup image scale
mosaic_scale: [0.1, 2.0]          # mosaic image scale
flip_prob: 0.5                    # flip image probability
mosaic_prob: 1                    # mosaic probability
mixup_prob: 1                     # mixup probability
degrees: 10                       # maximum rotate degrees
hsv_gain: [0.0138, 0.664, 0.464]  # hsv gain ratio# evaluate--------------------------------------------------------------------------------------------------------------
eval_at_start: false              # evaluate loaded model before training
val_conf_thres: 0.001             # confidence threshold when doing evaluation
val_nms_thres: 0.65               # NMS IOU threshold when doing evaluation
eval_only: false                  # do not train, run evaluation program only for all weights in output_dir
obj_conf_enabled: true            # use object confidence when doing inference
eval_interval: 1                  # evaluate interval epochs# show------------------------------------------------------------------------------------------------------------------
print_interval: 100               # print result after every $print_interval iterations# others----------------------------------------------------------------------------------------------------------------
load_optimizer_params: true       # load optimizer params when resume train, set false if there is an error.
train_backbone: true              # set false if you only want to train yolo head
train_start_layers: 51            # if not train_backbone, train from this layer, see params/models/edgeyolo.yaml
force_start_epoch: -1             # set -1 to disable this option

5.1、Datasets and Metrics

VisDrone2019-DET dataset：https://github.com/VisDrone/VisDrone-Dataset
MS COCO2017

metric 是 COCO 數據集的 mAP

5.2、Results & Comparison

baseline 是 yolov7 的 ELAN-Darknet

在這里插入圖片描述
作者的方法在小目標上的提升尤為明顯

VisDrone 數據上的模型 pre-trained on MS COCO2017-train.

FPS 在 device Jetson AGX Xavier 測試得到的

5.3、Ablation Study

（1）Decoupled head

在這里插入圖片描述

改進后又快又好

（2）Segmentation labels (poor effect)

旋轉增廣后 bbox 可能框的沒有那么準（由于bbox沒有角度平行于邊界導致），作者用分割的標簽輔助生成旋轉后的 bbox，不會產生 contain more invalid background information 的現象了

When the data augmentation is enabled and the loss enters a stable decline phase, using segmentation labels can bring a significant increase by 2% - 3% AP.

訓練末期的時候，關掉了數據增強， all labels become more accurate，even if the segmentation labels are not used, the final accuracy decreases only by about 0.04% AP（這說明 bbox 沒有 segmentation 的標簽準？？？）

（3）Loss function

在這里插入圖片描述

To sum up, a better precision can be obtained by using HR loss and cIOU loss in later training stages

5.4、Tricks for Edge Computing Devices

（1）Input size adaptation.

訓練的時候 640x640，部署的時候適配 device 的尺寸，4:3 or 16:9，可以顯著提速

在這里插入圖片描述

（2）Multi-process & multi-thread computing architecture

用多線程或者多進程來提速網絡運行時的三個階段

pre-process, model input and post-process

achieve about 8%-14% FPS increase.

可視化的結果展示

在這里插入圖片描述

6、Conclusion（own） / Future work

pre-process, model inference and post-process
edge computing device
time latency in post-processing is almost proportional to the number of anchors of each grid cell
Decouple，However, relations between the object’s location, confidence and category are not close enough in numerical logic
Multi-process & multi-thread computing architecture
we believe that the framework can be extended to other pixel level recognition tasks such as instance segmentation
Jetson AGX Xavier