【UAP】《Empirical Upper Bound in Object Detection and More》

在這里插入圖片描述

Borji A, Iranmanesh S M. Empirical upper bound in object detection and more[J]. arXiv preprint arXiv:1911.12451, 2019.

arXiv-2019

文章目錄

1、Background and Motivation
2、Related Work
3、Advantages / Contributions
4、Experimental Setup
- 4.1、Benchmarks Datasets and Metrics
- 4.2、Characterizing the Empirical Upper Bound
- 4.3、Error Diagnosis
- 4.4、Invariance Analysis
5、Conclusion（own） / Future work

1、Background and Motivation

背景

目標檢測是計算機視覺領域中的一個重要且具有挑戰性的問題。盡管近年來深度學習技術在目標檢測方面取得了顯著進展，但現代目標檢測器在流行基準測試集上的性能開始趨于飽和，這引發了關于深度學習工具和方法在目標檢測領域潛力的疑問。具體來說，研究人員開始探討在現有路徑下，目標檢測的性能還能提升多少，以及阻礙性能進一步提升的主要因素是什么。

動機

本文的動機在于通過系統分析，揭示目標檢測中的經驗上限（Empirical Upper Bound, EUB），即在當前技術條件下，目標檢測器可能達到的最佳性能。此外，作者還希望識別目標檢測器中的瓶頸，為未來目標檢測模型的設計和優化提供見解。

2、Related Work

works that strive to understand detection approaches， identify their shortcomings, and pinpoint where
more research is needed.
- person detectors、PASCAL datasets、ImageNet
comparing object detection models
- Some works have analyzed and reported statistics and performances over benchmark datasets such PASCAL VOC, MSCOCO, CityScapes, and open images.
- alternative or complementary evaluation measures
role of context in object detection and recognition

3、Advantages / Contributions

經驗上限的確定：作者通過分析兩個最新的目標檢測基準測試集和四個大規模數據集上的15個模型，首次系統地確定了目標檢測中的經驗上限AP（Upper Bound AP, UAP）。這一上限為評估現有模型性能提供了基準，并揭示了當前模型與理論上限之間的差距。

錯誤類型的診斷：作者以一種新穎且直觀的方式表征了目標檢測器中的錯誤來源，發現分類錯誤（包括與其他類別的混淆和漏檢）是主要的錯誤類型，其影響超過定位錯誤和重復檢測錯誤。

不變性分析：作者研究了模型在不同變換下的不變性特性，包括去除目標周圍上下文、將目標放置在不一致的背景中、圖像模糊和垂直翻轉等。這些分析揭示了模型在應對這些變換時的脆弱性，并為提高模型的魯棒性提供了方向。

4、Experimental Setup

4.1、Benchmarks Datasets and Metrics

Benchmarks

MMDetection
Detectron2

Datasets

4 datasets including PASCAL VOC,our home-brewed FASHION dataset, MSCOCO, and OpenImages

Our FASHION dataset covers 40 categories of clothing items (39 + humans). Trainval, and test sets for this dataset contain 206,530 images (776,172 boxes) and 51,650 images (193,689 boxes), respectively

在這里插入圖片描述

Metrics

用的是 COCO API 中的評價指標

4.2、Characterizing the Empirical Upper Bound

assume that the localization problem is solved and what remains is only object recognition

（1） Utility of the surrounding context

在這里插入圖片描述

在這里插入圖片描述
僅用目標區域做為識別的輸入效果最好

（2） Searching for the best label

strategy1 和 strategy2 是用來獲取 UAP 的

Strategy 1

使用最佳分類器直接對目標框進行分類

首先，使用一個經過訓練的最佳分類器（在這個研究中是ResNet152）對目標框（ground truth bounding boxes）進行分類。
分類器的分類得分直接作為檢測得分，從而計算AP。
由于使用的是目標框的真實標簽，因此這種方法實際上假設了定位問題已經解決，只關注于對象識別。

特點：

UAP 值在所有 IOU 閾值下都是相同的，因為檢測框就是目標框本身。
這種方法給出了一個理論上的上限，即如果定位完全準確，僅通過對象識別能達到的最佳AP。

Strategy 2

在目標框附近采樣候選框并選擇最佳分類（采樣方式為圖 3）：

在目標框周圍采樣多個候選框（IOU高于某個閾值γ），并使用相同的分類器對這些候選框進行分類。
選擇分類得分最高的候選框的標簽和置信度作為目標框的標簽和置信度，或者選擇出現頻率最高的標簽。
通過這種方式，嘗試在低于完美IOU的情況下找到更好的分類結果。

特點：

理論上，這種方法有可能在低于完美IOU的情況下提高AP，因為通過搜索周圍的候選框，可能找到更容易分類的框。
然而，在實際實驗中，這種方法并沒有顯著提高UAP值，除了在少數情況下（如FASHION數據集上的中等和小物體，以及COCO數據集上的小物體）。
作者將策略2的失敗歸因于周圍候選框可能包含額外的視覺內容，這些內容可能引入標簽噪聲，從而降低分類準確性。

Sampling boxes with IOU above a threshold

在這里插入圖片描述

圖 3 A） GT 是黑框，R2 是 GT 的面積，R1 是 Sampling boxes 與 GT 的交集

在這里插入圖片描述

IOU 的計算分母為什么是兩倍的 R2，因為作者做了如下的假設

we assume all boxes have the same width and height as the target box（sampling boxes 紅色虛線框與 GT 黑色實線框的面積是一樣的）

進一步推導可以得知

在這里插入圖片描述

圖 3 B）不同顏色區域對應的橫縱坐標
在這里插入圖片描述

（3）Upper bound results

在這里插入圖片描述

UAP（紅色虛線）是一條橫線的應該是 strategy 1，有波動的是 strategy 2

第一行前兩列應該是 PASCAL VOC 數據集，第一行后面兩列應該是 FASHION 數據集

第二行前兩列是 MSCOCO 數據集，后兩列在前兩列基礎上引入了 HTC

其他顏色的線段應該就是網絡訓練得到的正常結果，VOC 和 FASHION 數據集用的是 FCOS 框架，COCO 數據集用的是 Mask R-CNN 的框架

Chen K, Pang J, Wang J, et al. Hybrid task cascade for instance segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4974-4983.

在這里插入圖片描述

發現 strategy2 一般，后續討論 strategy1

VOC
在這里插入圖片描述
VOC 數據集的 UAP，左圖是 VOC 的評價指標，右圖是 COCO 的評價指標

右圖評價指標下最好的模型 FCOS 的 AP 才 47.9，與 UAP 91.6 差距是相當的大

FASHION

在這里插入圖片描述

FASHION 數據集下的 UAP

The gap between UAP and model AP here, however, is much smaller than VOC.

AP50 下 FASION 的 AP 快接近 UAP 了

在這里插入圖片描述

UAPs of 5 FASHION categories fall below the best model AP（倒反天罡）——Looking at the classification scores, we find that they have a low accuracy.——也就是所謂的最佳分類器（GT 作為輸入），沒有直接 train 出來的分類器效果好

注意這里的 UAP strategy1 求出來的，這么說呢，相當于武狀元蘇乞兒比武，什么都給你準備好了，最后上場發揮還是敗了，哈哈

在這里插入圖片描述

COCO

在這里插入圖片描述

第一行是 AP 閾值，第二行是 AP 尺寸

The gap between the best model AP and UAP is above 30

The gap is much smaller for AP at IOU=0.5 which is about 10

The UAP is much lower over small objects than UAP over large objects

在這里插入圖片描述
這張圖用的是 Detectron2 benchmark 的結果

OpenImages 數據集

achieve 58.9 UAP

We are not aware of any model scores on this set of OpenImages V4.

（4）AP vs. classification accuracy

在這里插入圖片描述

We found that there is a linear positive correlation (R2 = 0.81 on COCO) between the UAP and the classification accuracy

The higher the ACC.，the better the UAP 是合理的，因為用的是 strategy 1，作者發現了是呈線性關系（好像在策略1的假設下也沒有其他項來干擾了吧）

4.3、Error Diagnosis

定義了四種錯誤類型

在這里插入圖片描述

分類錯誤有兩種

confusion with the background (Type I)——誤檢，也可把類別混淆歸于 Type I
misses (Type II)——漏檢

定位錯誤，重復錯誤，作者一一修復這些錯誤，使得 AP 為 1，來看看每種錯誤類型對 AP 的影響

we argue that correcting the mislocalized predictions is more effective than removing them because it can reveal other sources of weakness in a model.（區別于 Hoeim et al. 的方法，圖 10）

在這里插入圖片描述

Confusion with the background (and other classes;see above) has the highest contribution to the overall error, across all models.

誤檢最嚴重

The second most important error type is misses.

其次是誤檢

作者也采用了 Hoiem 的方法進行了分析

Hoiem, Derek, Yodsawalai Chodpathumwan, and Qieyun Dai. “Diagnosing error in object detectors.” European conference on computer vision. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

在這里插入圖片描述
that classification error Type I (Sim, Oth, and BG in Fig. 10) accounts for the largest fraction of errors, followed by misses (FN) and localization (Loc) errors——紅綠紫占多數

4.4、Invariance Analysis

（1）Analysis of context

在這里插入圖片描述

加白背景和 noise背景沒有 objects only 好

They are hindered much more on small objects than medium or large ones, which shows how critical context is for recognition and detection of small objects

不同模型在去除上下文信息后的表現差異表明，某些模型（如FCOS）對上下文信息的依賴較小，而另一些模型（如FasterRCNN和SSD512）則可能更依賴于上下文信息來進行準確檢測。

應該把原始結果也貼上，這樣好直觀的對比與原圖的效果如何，FCOS 好像用 object only 時的效果比原圖還好

在這里插入圖片描述

shows the difference in distribution of predicted boxes and distribution of ground truth boxes.

看的不是特別明白，給人的感覺 MaskRCNN 飽和式攻擊，命中率低，FCOS 神槍手，命中率高

圖五試驗了下 incongruent contexts 在不同模型上的影響

在這里插入圖片描述

這個表也是，最好貼上原圖輸入的結果，多一個對比試驗

（2）Robustness to image transformations

在這里插入圖片描述
Poor performance here demonstrates how sensitive models are to object scale and that they lack robustness to object appearance.

很難識別 crop 出來的圖片，特別是小目標

RetinaNet and FCOS outperform other models here.

（3）Analysis of errors

在這里插入圖片描述

Gaussian blur 和 vertical flip 中漏檢最多

objects only 中類別混淆很小，因為沒有背景干擾

5、Conclusion（own） / Future work

參考 https://zhuanlan.zhihu.com/p/94990078
upper bound AP (UAP)
感覺 Characterizing the Empirical Upper Bound 這小節應該放在 Error Diagnosis 小節之后，不然一上來的假設 location 沒問題讓人很懵圈，應該是先分析出 location 沒有 recognition 錯誤占比那么大再假設 location 是 GT 邏輯上就通順一些
作者所謂的 UAP，就是當前分類器的上限了（要不然 GT 就是 100%），目標檢測器中的分類分支提升空間還很大
上下文信息對小目標的重要性再次得到了驗證
We did not find a significant contribution from the surrounding context of a target or its nearby overlapping boxes to better
classify it.
To evaluate the recognition component of a model, one can feed the target boxes to a model and collect its decisions on them
classification remains as the major bottleneck
classification error (confusion with other classes and misses) weighs more than localization and duplicate errors
作者在目標檢測上做了分析，同樣的分析方法可以遷移到 semantic and instance segmentation.