【SiamFC】《Fully-Convolutional Siamese Networks for Object Tracking》

在這里插入圖片描述

ECCV 2016 Workshops

文章目錄

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 The OTB-13 benchmark
- 5.3 The VOT benchmarks
- 5.4 Dataset size
6 Conclusion（own）/ Future work

1 Background and Motivation

單目標跟蹤

track any arbitrary object, it is impossible to have already gathered data and trained a specific detector

在線學習方法的缺點（either apply “shallow” methods (e.g. correlation filters) using the network’s internal representation as features or perform SGD (stochastic gradient descent) to fine-tune multiple layers of the network）

a clear deficiency of using data derived exclusively from the current video is that only comparatively simple models can be learnt.

實時性可能也是個問題

作者基于全卷積孿生網絡，來實現單目標跟蹤，且只要是目標檢測的數據集，都可以拿來訓練（the fairness of training and testing deep models for tracking using videos from the same domain is a point of controversy）

在這里插入圖片描述

2 Related Work

train Recurrent Neural Networks (RNNs) for the problem of object tracking
track objects with a particle filter that uses a learnt distance metric to compare the current appearance to that of the first frame.
feasibility of fine-tuning from pre-trained parameters at test time

3 Advantages / Contributions

we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video
frame-rates beyond real-time
achieves state-of-the-art performance in multiple benchmarks

4 Method

在這里插入圖片描述

$g(\varphi(z), \varphi(x))$

exemplar image $z$

candidate image $x$

在這里插入圖片描述

$g$ is a simple distance or similarity metric

$\varphi$ 是孿生網絡，結構如下

在這里插入圖片描述
x 和 z 獲取的細節（來自 pysot 代碼）

在這里插入圖片描述

更具體的公式如下

在這里插入圖片描述

$\mathbb{L}$ denotes a signal which takes value $\mathbb{R}$ in every location

每個空間位置的 b 應該是相等的吧

損失函數

在這里插入圖片描述
y 是標簽，1 或者 -1

v 是 score map 上的得分（0-1）之間

在這里插入圖片描述
u 是空間位置，D 是 score map

預測的bounding box 中心點位于 ground true bounding box 中心半徑小于 R 區域的都屬于正樣本

c 是 GT bbox 的中心點

stride k of the network

訓練的時候用的 SGD 優化

在這里插入圖片描述

5 Experiments

50 epochs 50,000 sampled pairs

SiamFC (Siamese Fully Convolutional) and SiamFC-3s, which searches over 3 scales instead of 5.

scale 的細節不太清楚

5.1 Datasets and Metrics

訓練集
ImageNet Video for tracking，4500 videos

測試集

ALOV
OTB-13
VOT-14 / VOT-15 / VOT-16

a tracker is successful in a given frame if the intersection over-union (IoU) between its estimate and the ground-truth is above a certain threshold

OTB上常用的3個：TRE、SRE、OPE

OPE：單次評估精度，TRE運行一次的結果。
TRE: 將序列劃分為20個片段，每次是從不同的時間初始化，然后去跟蹤目標。
SRE: 從12個方向對第一幀的目標位置設置10%的偏移量，然后跟蹤目標，判斷目標跟蹤精度。

通用指標

OP(%): overlap precision 重疊率
重疊率 = 重疊區域面積/(預測矩形的面積+真實矩形的面積-重疊區域的面積)
CLE（pixels）: center location error 中心位置誤差
中心位置誤差 = 真實中心和預測中心的歐式距離
DP:distance precision 精確度
AUC: area under curve 成功率z圖的曲線下面積

VOT當中一些指標

Robustness：數值越大，穩定性越差。

5.2 The OTB-13 benchmark

在這里插入圖片描述

5.3 The VOT benchmarks

VOT-14
在這里插入圖片描述
VOT-15

5.4 Dataset size

在這里插入圖片描述

看看實際的效果
在這里插入圖片描述
缺點：框的 spatial ratio 是固定的

6 Conclusion（own）/ Future work

參考文章：

視覺目標跟蹤SiamFC
單目標跟蹤論文綜述：SiamFC、Siam系列、GradNet等一覽
【目標跟蹤線上交流會】第十五期 Pysot實驗總結
SiamRPN代碼解讀–proposal selection部分
單目標追蹤-SiamFC

僅看文章，許多實現細節我都不夠清晰，還是得擼擼代碼

Deep Siamese conv-nets have previously been applied to tasks such as face verification, keypoint descriptor learning and one-shot character recognition

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/696094.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/696094.shtml
英文地址，請注明出處：http://en.pswp.cn/news/696094.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！