【SVM】簡單介紹（一）

1、結構風險最小化

我們想要在未知的數據上得到低的錯誤率，這叫做structural risk minimization;相對的，訓練誤差叫做empirical risk minimization

要是我們能有這樣一個式子就好了：
$Test?error?rate?<=train?error?rate?+f(N,h,p)\text { Test error rate }<=\text { train error rate }+f(N, h, p)$
其中，
$N=\mathrm{N}=$ size of training set,
$h=\mathrm{h}=$ measure of the model complexity,
$p =$ the probability that this bound fails

我們需要一個 $p$ 來限制最差的測試集的情況。然后我們就能通過選擇模型復雜度 $h$ 來最小化測試誤差率的上界。

2、VC維（Vapnik-Chervonenkis dimension）

VC維就是衡量模型復雜度的一個重要概念

選擇 $n$ 個樣本點，我們隨機的給它們分配標簽，如果我們的模型都能將它們分開，則樣本的VC維 $>= n$ ,我們不斷地增大 $n$ ，直到模型不能分開為止。模型能分開的最大的 $n$ 值就是模型的VC維。
在這里插入圖片描述
考慮一條二維空間的一條直線的復雜度。如上圖所示，對于三個數據點，無論我們如何分配標簽，直線都能很好的將樣本正確分類。但是對于四個數據點，直線無法處理異或問題。因此，二維空間中一條直線的VC維是3.

二維空間中的超平面的VC維是3。更一般的，k維空間的超平面的VC維是k+1。

超平面的VC維與參數的數目相等。但這只是個巧合，事實上，模型的參數和模型的復雜度并沒有必然的聯系。例如，一個正弦曲線的VC維是正無窮大，但是它的參數只有3個。
$\sin (b x+c)$
在這里插入圖片描述

最近鄰分類器的VC維是無窮大的，因為無論你有多少數據點，你都能在訓練集上得到完美的分類器。而當 $1-NN?→K-NN?\text { 1-NN } \rightarrow \text { K-NN }$ ,相當于減小了VC維。

一般來說，VC維越高，模型的分類能力更強，更flexible。但是VC維更多的是作為一個概念，實際上很難去計算一個模型的VC維。

回到剛才，經過前人推導，我們得到了一個測試集誤差的上界如下：
$Etest?≤Etrain?+(h+hlog?(2N/h)?log?(p/4)N)12E_{\text {test }} \leq E_{\text {train }}+\left(\frac{h+h \log (2 N / h)-\log (p / 4)}{N}\right)^{\frac{1}{2}}$
$N=\mathrm{N}=$ size of training set
$h = V C$ dimension of the model class
$p =$ upper bound on probability that this bound fails

從上式看， $Good?generalization?→large?Nand?small?h\text { Good generalization } \rightarrow \text { large } N \text { and small } h$
這是符合我們的直覺的。上面這個式子的推導很復雜，但是在實際中也沒啥用，因為它給的上界太松了，一個很松的上界又有什么意義呢。

3、Hard-Margin SVM

在這里插入圖片描述
對于上面這個兩類分類問題，我們用直線進行分類，會發現有好多種分類方法。而SVM所選擇的那條直線，是將Margin 最大化的直線。

為什么要最大化Margin?

直覺上感覺這最安全
實際工作中確實不錯
當數據有小波動時錯分的概率小一些
模型只與support vector有關

點到直線的距離：
$d(x)=∣x?w+b∣∥w∥22=∣x?w+b∣∑i=1dwi2d(\mathbf{x})=\frac{|\mathbf{x} \cdot \mathbf{w}+b|}{\sqrt{\|\mathbf{w}\|_2^2}}=\frac{|\mathbf{x} \cdot \mathbf{w}+b|}{\sqrt{\sum_{i=1}^d w_i^2}}$
定義Margin:
$margin?≡arg?min?x∈Dd(x)=arg?min?x∈D∣x?w+b∣∑i=1dwi2\operatorname{margin} \equiv \underset{\mathbf{x} \in D}{\arg \min } \,d(\mathbf{x})=\underset{\mathbf{x} \in D}{\arg \min } \frac{|\mathbf{x} \cdot \mathbf{w}+b|}{\sqrt{\sum_{i=1}^d w_i^2}}$
所以我們就要考慮如何來最大化這個Margin了。

那么我們的問題就可以建模為：
$argmax?w,bmargin?(w,b,D)=arg?max?w,barg?min?xi∈Dd(xi)=arg?max?w,barg?min?xi∈D∣b+xi?w∣∑i=1dwi2\begin{aligned} & \underset{\mathbf{w}, b}{\operatorname{argmax}} \operatorname{margin} (\mathbf{w}, b, D) \\ & =\underset{\mathbf{w}, b}{\arg\max}\, \underset{\mathbf{x}_i \in D}{\arg \min } \,d\left(\mathbf{x}_i\right) \\ & =\underset{\mathbf{w}, b}{\arg\max} \,\underset{\mathbf{x}_i \in D}{\arg \min } \frac{\left|b+\mathbf{x}_i \cdot \mathbf{w}\right|}{\sqrt{\sum_{i=1}^d w_i^2}} \end{aligned}$
如果只是這樣的話，肯定是不行的，想像一下，一條直線離兩類都非常遠，也是符合上式的，因此需要附加條件，也就是直線能正確分類。
$WXi+b≥0iff?yi=1WXi+b≤0iff?yi=?1yi(WXi+b)≥0\begin{gathered} \mathbf{W X _ { i }}+b \geq 0 \text { iff } y_i=1 \\ \mathbf{W X _ { i }}+b \leq 0 \text { iff } y_i=-1 \\ y_i\left(\mathbf{W X _ { i }}+b\right) \geq 0 \end{gathered}$
也就是
$argmax?w,barg?min?xi∈D∣b+xi?w∣∑i=1dwi2subject?to??xi∈D:yi(xi?w+b)≥0\begin{aligned} & \underset{\mathbf{w}, b}{\operatorname{argmax}} \,\underset{\mathbf{x}_i \in D}{\arg \min } \frac{\left|b+\mathbf{x}_i \cdot \mathbf{w}\right|}{\sqrt{\sum_{i=1}^d w_i^2}} \\ & \text { subject to } \forall \mathbf{x}_i \in D: y_i\left(\mathbf{x}_i \cdot \mathbf{w}+b\right) \geq 0 \end{aligned}$
這邊對一個限制條件進行強化，假設
$?xi∈D:∣b+xi?w∣≥1\forall \mathbf{x}_i \in D:\left|b+\mathbf{x}_i \cdot \mathbf{w}\right| \geq 1$
因為咱們是對w進行優化，所以本質上其實是一樣的，就是一個縮放的問題。

那么對里層的優化，有：
$arg?min?xi∈D∣b+xi?w∣∑i=1dwi2≥arg?min?xi∈D1∑i=1dwi2=1∑i=1dwi2\underset{\mathbf{x}_i \in D}{\arg \min } \frac{\left|b+\mathbf{x}_i \cdot \mathbf{w}\right|}{\sqrt{\sum_{i=1}^d w_i^2}} \geq \underset{\mathbf{x}_i \in D}{\arg \min } \frac{1}{\sqrt{\sum_{i=1}^d w_i^2}}=\frac{1}{\sqrt{\sum_{i=1}^d w_i^2}}$

對于外層優化，只需最大化里層的下界就行，于是問題化為：
$argmin?w,b∑i=1dwi2subject?to??xi∈D:yi(xi?w+b)≥1\begin{aligned} & \underset{\mathbf{w}, b}{\operatorname{argmin}} \sum_{i=1}^d w_i^2 \\ & \text { subject to } \forall \mathbf{x}_i \in D: y_i\left(\mathbf{x}_i \cdot \mathbf{w}+b\right) \geq 1 \end{aligned}$

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/258302.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/258302.shtml
英文地址，請注明出處：http://en.pswp.cn/news/258302.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！