神經網絡不確定性綜述(Part I)——A survey of uncertainty in deep neural networks

相關鏈接：

神經網絡不確定性綜述(Part I)——A survey of uncertainty in deep neural networks-CSDN博客

神經網絡不確定性綜述(Part II)——Uncertainty estimation_Single deterministic methods-CSDN博客

神經網絡不確定性綜述(Part III)——Uncertainty estimation_Bayesian neural networks-CSDN博客

神經網絡不確定性綜述(Part IV)——Uncertainty estimation_Ensemble methods&Test-time augmentation-CSDN博客

神經網絡不確定性綜述(Part V)——Uncertainty measures and quality-CSDN博客

0. 概述

隨著神經網絡技術在現實世界中的應用不斷廣泛，神經網絡預測置信度變得越來越重要，尤其是在醫學圖像分析與自動駕駛等高風險領域。然而，最基本的神經網絡并不包含置信度估計的過程，并且通常面臨著over-confidence或者under-confidence的問題。針對此問題，研究人員開始關注于量化神經網絡預測中存在的uncertainty，由此定義了不同類型、不同來源的uncertainty以及量化uncertainty的技術。

本篇文章嘗試對神經網絡中的不確定性估計方法進行總結和歸納，將uncertainty的來源分為reducible model uncertainty以及irreducible data uncertainty兩種類別，介紹了基于確定性神經網絡(deterministic neural networks)、貝葉斯神經網絡(Bayesian neural networks)、神經網絡集成(ensemble of neural networks)以及測試時數據增強(test-time data augmentation)等不確定性建模方法。

1. Introduction

深度神經網絡(Deep Neural Networks, DNN)在mission- and safety-critical real world applications上存在局限，具體表現為：

DNN inference model的表達能力(expressiveness)和透明度(transparency)不足，導致它們產生的預測結果難以信服——可解釋性差；
DNN無法區分in-domain與out-of-domain的樣本，對domain shifts十分敏感——泛化性差；
無法提供可靠的不確定性估計，并且趨向于產生over-confident predictions——過度自信；
對adversarial attacks的敏感性導致DNN很容易遭到攻擊而被破壞——系統穩定性差。

造成以上現象出現的主要來源有兩種，一種是數據本身引入的不確定性，即data uncertainty；另一種是神經網絡學到的知識不足所造成的不確定性，即model uncertainty。為了克服DNN的這些局限性，uncertainty estimation就至關重要。有了不確定性估計，人類專家就可以通過查看模型預測結果所對應的不確定性，忽略那些不確定性很高的結果。

不確定性估計不僅對于高風險領域的安全決策有幫助，在數據源高度不均勻(inhomogeneous)以及標注數據較少的領域也十分關鍵，并且對于以不確定性作為學習技術關鍵部分的領域例如active learning及reinforcement learning同樣至關重要。

近年來，研究者在DNN的不確定性估計方面展現了與日俱增的興趣。估計預測不確定性(predictive uncertainty)最常用的方法是分別對由模型引起的不確定性(認知不確定性/模型不確定性，epistemic or model uncertainty)和由數據引起的不確定性(任意不確定性/數據不確定性，aleatoric or data uncertainty)進行建模。其中模型不確定性是可以降低的，而數據不確定性則無法降低。

目前，對于這兩種不確定性最常見的建模方法包括：

Bayesian inference
Ensemble approaches
Test-time augmentation
Single deterministic networks containing explicit components to represent the model and the data uncertainty.

然而，在高風險領域僅僅對predictive uncertainty進行估計還遠遠不夠，還需要進一步確認估計得到的uncertainty是否可靠。為此，研究人員研究了DNN的校準特性(the degree of reliability，可靠性程度)，并提出了重新校準(re-calibration)方法，以獲得可靠的(well-calibrated，校準良好的)不確定性估計。

下面幾個章節將具體介紹不確定性的來源、類型、DNN中不確定性估計的方法、評估不確定性估計質量的度量方法、calibrate DNN的方法、被頻繁使用的evaluation datasets與benchmarks、不確定估計在現實世界的應用以及該領域現有的挑戰與未來展望。

2. Uncertainty in deep neural networks

一個神經網絡就是一個被模型參數? $\theta$ ?所參數化的非線性函數? $f_\theta$ ，這個函數會將可測的輸入集? $\mathbb{X}$

映射到另一個可測的輸出集? $\mathbb{Y}$ ?上，即： $f_\theta:\mathbb{X}\rightarrow\mathbb{Y},f_\theta(x)=y$ .

在supervised setting下，進一步地，對于有限訓練集? $\mathcal{D}\subseteq\mathbb{D}=\mathbb{X}\times\mathbb{Y}$ ，其中包含? $N$ ?個圖像-標簽對，即

$\mathcal{D}=(\mathcal{X},\mathcal{Y})=\{?{x_n,y_n}\}_{n=1}^N\subseteq\mathbb{D}.$

此時對于新的數據樣本 $x^\ast\in\mathbb{X}$ ，在? $\mathcal{D}$ ?上訓練得到的神經網絡可以預測其對應的target，

$f_\theta(x^\ast)=y^\ast.$

現在，我們考慮從環境中的原始信息(raw information)到最終神經網絡產生的帶有uncertainty的prediction的整個過程中包含的四個不同的步驟：

the data acquisition process——在環境中出現的一些information (e.g. a bird’s singing)以及對這些信息的measured observation (e.g. an audio record).
the DNN building process——網絡的設計和訓練
the applied inference model——inference階段使用的model
the prediction’s uncertainty model——對神經網絡and/or數據產生的uncertainty的建模

事實上，這四個步驟包含了幾個潛在的uncertainty/error來源，并進一步影響神經網絡的最終預測。作者認為造成神經網絡prediction中uncertainty的最重要的五個因素是：

真實世界的多變性variability
測量系統固有的誤差
DNN架構規范(architecture specification)中的誤差，例如每一層節點數量、激活函數等
DNN訓練過程中的誤差
由unknown data產生的誤差 $y|\omega\sim p_{y|\omega}$

接下來，我們將具體介紹以上四個步驟/五個錯誤來源從而產生不確定性的具體細節。

2.1 Data acquisition

對于監督學習而言，data acquisition描述了measurements $x$ 以及target variables $y$ ?的生成過程，從而在某個space $\Omega$ 上表示真實世界的situation $\omega$ .

In the real world, a realization of $\omega$ ?could for example be a bird, $x$ ?a picture of this bird, and $y$ ?a label stating ‘bird’. During the measurement, random noise can occur and information may get lost. We model this randomness in $x$ ?by

$x|\omega\sim p_{x|\omega}$

Equivalently, the corresponding target variable? $y$ ?is derived, where the description is either based on another measurement or is the result of a labeling process*. For both cases, the description can be affected by noise and errors and we state it as

$y|\omega\sim p_{y|\omega}$

*NOTE:? In many cases one can model the labeling process as a mapping from $\mathbb{X}$ ?to $\mathbb{Y}$ ,e.g. for speech recognition or various computer vision tasks. For other tasks, such as earth observation, this is not always the case. Data is often labeled based on high-resolution data while low-resolution data is utilized for the prediction task.

A neural network is trained on a finite dataset of realizations of? $x|\omega_i$ ?and? $y|\omega_i$ ?based on? $N$ ?real world situations? $\omega_1,\ldots,\omega_N$ ,

$\mathcal{D}={x_i,y_i}_{i=1}^N$

當我們收集數據并利用其對神經網絡進行訓練時，有2個因素會使神經網絡產生uncertainty。

第一個因素源自于真實世界復雜可變的situations。例如，一株植物在雨后和干旱兩個場景下所呈現出的外觀是不同的。我們只能在采集數據時盡可能地覆蓋而無法窮舉所有情形，這導致神經網絡在面臨distribution shifts時表現不佳。

Factor I: Variability in real-world situations

Most real-world environments are highly variable and almost constantly affected by changes. These changes affect parameters such as temperature, illumination, clutter, and physical objects’ size and shape. Changes in the environment can also affect the expression of objects, such as plants after rain look very different from plants after a drought. When real-world situations change compared to the training set, this is called a distribution shift. Neural networks are sensitive to distribution shifts, which can lead to significant changes in the performance of a neural network.

第二個因素是測量系統(measurement system)本身，它直接影響了sample與其相應target的相關性。The measurement system generates information? $x_i$ ?and? $y_i$ ?that describe? $\omega_i$ .?這意味著高度不同的現實世界場景卻擁有相似的measurement或者targets。比如，對于city和forest兩種situation而言，測量系統測量得到的溫度temperature很可能比較相近(? $x$ ?相近)；另外，label noise也可能導致它們的targets相近，比如將二者都標注為forest(? $y$ ?相近)。

Factor II: Error and noise in measurement systems

The measurements themselves can be a source of uncertainty on the neural network’s prediction. This can be caused by limited information in the measurements, such as the image resolution. Moreover, it can be caused by noise, for example, sensor noise, by motion, or mechanical stress leading to imprecise measures. Furthermore, false labeling is also a source of uncertainty that can be seen as an error or noise in the measurement system. It is referenced as label noise and affects the model by reducing the confidence on the true class prediction during training. Depending on the intensity, this type of noise and errors can be used to regularize the training process and to improve robustness and generalization.

2.2 Deep neural network design and training

神經網絡的設計包括顯式建模(explicit modeling)以及隨機的訓練過程。由設計和訓練神經網絡引發的對問題結構的假設稱為歸納偏置(inductive bias)。具體而言，歸納偏置可理解為建模者在建模階段根據先驗知識所采取的策略。比如，對于網絡的structure而言，涉及網絡參數量、層數、激活函數的選擇等；對于網絡的training process而言，涉及優化算法、正則化、數據增強等。這些策略的選擇直接影響了模型最終的性能，而網絡的structure帶來了神經網絡預測具有不確定性的第三個因素。

Factor III: Errors in the model structure

The structure of a neural network has a direct effect on its performance and therefore also on the uncertainty of its prediction. For instance, the number of parameters affects the memorization capacity, which can lead to under- or over-fitting on the training data. Regarding uncertainty in neural networks, it is known that deeper networks tend to be overconfident in their soft-max output, meaning that they predict too much probability on the class with the highest probability score.

對于給定的網絡結構? $s$ ?以及訓練數據集? $\mathcal{D}$ ，神經網絡的訓練是一個隨機的過程，因此所產生的網絡? $f_\theta$ ?基于隨機變量? $\theta|D,s\sim p_{\theta|D,s}$ 。網絡訓練過程中的隨機性有很多，比如隨機決策，比如數據順序、隨機初始化、隨即正則化如augmentation或dropout，這使得網絡的loss是高度非線性的，從而導致不同的局部最優解? $\theta^\ast$ ，從而產生不同的模型；此外，batch size、learning rate以及epoch等超參數也會影響訓練結果，產生不同的模型。神經網絡的這種對訓練過程的敏感性產生了第四個不確定性因素。

Factor IV: Errors in the training procedure

The training process of a neural network includes many parameters that have to be defined (batch size, optimizer, learning rate, stopping criteria, regularization, etc.), and also stochastic decisions within the training process (batch generation and weight initialization) take place. All these decisions affect the local optima and it is therefore very unlikely that two training processes deliver the same model parameterization. A training dataset that suffers from imbalance or low coverage of single regions in the data distribution also introduces uncertainties on the network’s learned parameters, as already described in the data acquisition. This might be softened by applying augmentation to increase the variety or by balancing the impact of single classes or regions on the loss function.

由于訓練過程是基于給定的訓練數據集? $\mathcal{D}$ ?的，因此data acquisition process中的error(比如label noises)也會導致training process中的error。

2.3 Inference

Inference描述了神經網絡對新數據樣本? $x^\ast$ ?的輸出? $y^\ast$ ?的預測。在這種情況下，網絡的訓練是針對于特定的任務的。因此，不符合該任務輸入的樣本會產生error，因此也是uncertainty的來源之一。

Factor V: Errors caused by unknown data

Especially in classification tasks, a neural network that is trained on samples derived from a world $\mathcal{W}_1$ ?can also be capable of processing samples derived from a completely different world $\mathcal{W}_2$ . This is for example the case when a network trained on images of cats and dogs receives a sample showing a bird. Here, the source of uncertainty does not lie in the data acquisition process, since we assume a world to contain only feasible inputs for a prediction task. Even though the practical result might be equal to too much noise on a sensor or complete failure of a sensor, the data considered here represents a valid sample, but for a different task or domain.

這種誤差并不是data acquisition process所造成的，而是由未知數據產生的。

2.4?Predictive uncertainty model

神經網絡預測中包含的不確定性可以分為三類：

data uncertainty [also statistical or aleatoric uncertainty]
model uncertainty [also systemic or epistemic uncertainty]
distributional uncertainty (caused by examples from a region not covered by the training data)

2.4.1?Model and data uncertainty

The model uncertainty源于模型的缺陷。這些缺陷可能是模型本身結構有問題，也可能是在訓練進程中引入錯誤、或者由于unknown data/訓練集對真實世界覆蓋能力差 (bad coverage)而導致得到的模型缺乏足夠的knowledge。

The data uncertainty指的是源于數據本身的缺陷，其根本原因在于information loss，使得無法完美地表示真實世界，樣本不包含足夠的信息以100%的certainty識別某個類別。

信息損失本身也體現在兩個方面，一方面是source (data)，比如低分辨率圖像會丟失表示真實世界的信息；另一方面是target (label)，比如在labelling process中出現的錯誤。

現在讓我們回顧一下五種造成uncertain prediction的因素：

Factor I 真實世界的多樣性/多變性
Factor II 測量系統的缺陷
Factor III 模型結構的缺陷
Factor IV 模型隨機訓練過程中的錯誤
Factor V 未知數據的干擾

可以看到，只有Factor II屬于不可消除的aleatoric uncertainty，因為它造成了數據本身的缺陷即insufficient data，從而導致預測變得不可靠。其余的Factor全部屬于epistemic uncertainty，是可以消除的。

理論上model uncertainty可以通過改進model architecture、learning process或者training dataset來降低，而data uncertainty無法被消除。因此，對于real-world applications而言，如果一個模型能夠remove or quantify the model uncertainty and give a correct prediction of the data uncertainty將至關重要。(此處有supervision的含義在里面)

在眾多方法中，Bayesian framework提供了一個實用的工具來推理深度學習中的uncertainty。在貝葉斯框架中，model uncertainty被形式化為模型參數的概率分布，而data uncertainty被形式化為模型(參數為 $\theta$ )的output? $y^\ast$ 的概率分布。

預測? $y^\ast$ ?的概率分布為

$p(y^*|x^*,D)=\int p(\underbrace{y^*|x^*,\theta}_{\mathrm{Data}}\underbrace{p(\theta|D)}_{\mathrm{Model}}d\theta.$

其中， $p(\theta|D)$ ?是模型參數的posterior，描述了給定訓練集? $D$ ?后模型參數的不確定性。通常情況下，后驗分布是intractable的，為了得到(近似)后驗分布，ensemble approaches嘗試學習不同的parameter settings并對結果做平均來近似后驗分布；而Bayesian inference利用貝葉斯定理將其reformulate為

$p(\theta|D)=\frac{p(D|\theta)p(\theta)}{p(D)}$

其中? $p(\theta)$ ?不考慮任何information而僅僅考慮? $\theta$ ?因此被稱為模型參數的先驗分布prior， $p(D|\theta)$ ?代表? $D$ ?是模型(參數為 $\theta$ )預測所產生輸出分布的一個realization，稱為likelihood。許多損失函數都是由似然函數驅動的，尋找最大化對數似然的損失函數的例子如交叉熵或均方誤差。

然而，即使我們使用貝葉斯定理將后驗分布reformulate為以上式子， $p(y^\ast|x^\ast,D)$ 仍然是intractable的。為了解決這個問題，一系列方法被逐漸提出，在Section 3會具體介紹。

2.4.2 Distributional uncertainty

The predictive uncertainty可以進一步地被劃分為data、model以及distributional uncertainty三個部分，

$p(y^*|x^*,D)=\int\int\underbrace{p(y^*|\mu)}_{\mathrm{Data}}\underbrace{p(\mu|x^*,\theta)}_{\text{Distributional}}\underbrace{p(\theta|D)}_{\mathrm{Model}}d\mu d\theta.$

如何理解其中的distributional part？我們可以這樣考慮，uncertainty意為不確定性，可以用一個分布進行刻畫，比如model uncertainty可以由模型參數? $\theta$ ?的分布得到。同理，distributional uncertainty可以由distribution的分布得到，即，分布的分布；例如對于分類任務而言， $p\left(\mu\middle| x^\ast,\theta\right)$ 可能是Dirichlet distribution，指的是最終由soft-max給出的分類分布所服從的分布。再進一步，data uncertainty代表最終預測結果? $y^\ast$ 的分布(以? $\mu$ ?為參數)。

根據這種建模方式，distributional uncertainty表示由于input-data distribution發生改變所引入的不確定性；model uncertainty表示在model building and training process中引入的不確定性。model uncertainty會影響distributional uncertainty的估計，distributional uncertainty又會進一步影響data uncertainty的估計。

2.5 Uncertainty classification

On the basis of the input data domain, the predictive uncertainty can also be classified into 3 main classes:

In-domain uncertainty

In-domain uncertainty represents the uncertainty related to an input drawn from a data distribution assumed to be equal to the training data distribution. 域內不確定性源于深度神經網絡lack of in-domain knowledge而無法解釋域內樣本，從modeler的角度看，其原因一方面是design errors (model uncertainty)，另一方面是當前任務的復雜性 (data uncertainty)。因此，根據這兩類來源，我們可以通過提高訓練集的質量或者優化訓練過程來減少in-domain uncertainty。

Domain-shift uncertainty

Domain-shift uncertainty denotes the uncertainty related to an input drawn from a shifted version of the training distribution. 域偏移不確定性源于訓練數據集的insufficient coverage，其原因可能是數據集本身的收集不當，也可能是真實世界的多變性。從modeler的角度看，domain-shift uncertainty是源于external or environmental factors。這種uncertainty可以通過cover the shifted domain in the training dataset來減少。

Out-of-domain uncertainty

Out-of-domain uncertainty represents the uncertainty related to an input drawn from the subspace of unknown data. 未知數據的分布遠不同于訓練數據集的分布，比如將鳥輸入到貓狗訓練器。從modeler的角度看，out-of-domain uncertainty源于輸入樣本，即insufficient training data。

我們發現，上述三種uncertainty的根本都是因為訓練數據集的insufficiency。再將視角回到model uncertainty與data uncertainty上，顯然，模型不確定性的主要原因同樣是insufficient training dataset，因此in-domain、domain shift、out-of-domain uncertainty都能產生model uncertainty；而相比之下，data uncertainty則更多與in-domain uncertainty相關，例如overlapping samples and systematic label noise，而這些誤差均來自于域內數據的模糊/噪聲。