簡介
梯度提升主要是基于數學最值問題
數學描述
目標函數為
obj(θ)=∑i=1nl(yi,y^i(t))+∑k=1tw(fk)obj(\theta) = \sum_{i=1}^n l(y_i, \hat y_i^{(t)}) + \sum_{k=1}^t w(f_k)obj(θ)=i=1∑n?l(yi?,y^?i(t)?)+k=1∑t?w(fk?)
其中ttt表示集成的樹的個數,y^i(t)=y^i(t?1)+ft(xi)\hat y_i^{(t)} = \hat y_i^{(t - 1)} + f_t(x_i)y^?i(t)?=y^?i(t?1)?+ft?(xi?)
在集成第ttt個樹時,目標函數表示為
obj(t)=∑i=1nl(yi,y^i(t))+∑k=1tw(fk)=∑i=1nl(yi,y^i(t?1)+ft(xi))+w(ft)+constant
\begin{align} obj^{(t)} &= \sum_{i=1}^n l(y_i, \hat y_i^{(t)}) + \sum_{k=1}^t w(f_k) \\ &= \sum_{i=1}^n l(y_i, \hat y_i^{(t - 1)} + f_t(x_i)) + w(f_t) + constant
\end{align}
obj(t)?=i=1∑n?l(yi?,y^?i(t)?)+k=1∑t?w(fk?)=i=1∑n?l(yi?,y^?i(t?1)?+ft?(xi?))+w(ft?)+constant??
對l(yi,y^i(t?1)+ft(xi))l(y_i, \hat y_i^{(t - 1)} + f_t(x_i))l(yi?,y^?i(t?1)?+ft?(xi?))泰勒級數展開為
l(yi,y^i(t?1)+ft(xi))=l(yi,y^i(t?1))+gift(xi)+12hift2(xi)l(y_i, \hat y_i^{(t - 1)} + f_t(x_i)) = l(y_i, \hat y_i^{(t - 1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)l(yi?,y^?i(t?1)?+ft?(xi?))=l(yi?,y^?i(t?1)?)+gi?ft?(xi?)+21?hi?ft2?(xi?)其中gi=?y^i(t?1)l(yi,y^i(t?1)),hi=?y^i(t?1)2l(yi,y^i(t?1))g_i=\partial_{\hat y_i^{(t - 1)}} l(y_i, \hat y_i^{(t - 1)}), h_i=\partial_{\hat y_i^{(t - 1)}}^2 l(y_i, \hat y_i^{(t - 1)})gi?=?y^?i(t?1)??l(yi?,y^?i(t?1)?),hi?=?y^?i(t?1)?2?l(yi?,y^?i(t?1)?)所以替換后,刪除常量后有
obj(t)=∑i=1n[gift(xi)+12hift2(xi)]+w(ft)obj^{(t)} =\sum_{i=1}^n \left[ g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)\right ] + w(f_t) obj(t)=i=1∑n?[gi?ft?(xi?)+21?hi?ft2?(xi?)]+w(ft?)