lightgbm算法學習

主要組件

Boosting

void GBDT::Init(const Config* gbdt_config, const Dataset* train_data,const ObjectiveFunction* objective_function,const std::vector<const Metric*>& training_metrics) override

初始化，主要是創建樣本采樣策略data_sample_strategy_,設置目標函數objective_function_,創建tree_learner_,創建train_score_updater_，配置training_metrics_

void GBDT::Train(int snapshot_freq, const std::string& model_output_path) override

訓練處理

bool GBDT::TrainOneIter(const score_t* gradients, const score_t* hessians) override

單次迭代訓練

void GBDT::Boosting()

計算梯度和海森矩陣

void UpdateScore(const Tree* tree, const int cur_tree_id)

樹訓練完后更新評分

TreeLearner

目標函數ObjectiveFunciton

二分類對數損失

一般定義為
$\cdot f(x)))$
其中 $y$ 是標簽，取值為 ${?1,1}\left \{-1,1\right \}$ ， $f (x)$ 是模型輸出的分數，令 $z=y?f(x)z=y\cdot f(x)$ ，則損失函數為 $L = l o g (1 + e x p (? z))$
對 $z$ 求導有 $?L?z=?exp(?z)1+exp(?z)=?11+exp(z)\frac{\partial L}{\partial z} = \frac{-exp(-z)}{1+exp(-z)} = -\frac{1}{1+exp(z)}$ ,所以對 $f (x)$ 求導有
$?L?f(x)=?L?z??z?f(x)=?y1+exp(y?f(x))\frac{\partial L}{\partial f(x)} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial f(x)} = - \frac{y}{1+exp(y \cdot f(x))}$
在BinaryLogloss中對損失函數添加了縮放因子sigmoid_，即
$\cdot \sigma \cdot f(x)))$
對 $f (x)$ 求導
$?L?f(x)=?y?σ1+exp(y?σ?f(x))\frac{\partial L}{\partial f(x)}=-\frac{y \cdot \sigma}{1+exp(y\cdot \sigma \cdot f(x))}$
BinaryLogloss在計算梯度時添加了樣本權重weights_[i]和標簽權重label_weight

const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
gradients[i] = static_cast<score_t>(response * label_weight * weights_[i]);

GBDT

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/88875.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/88875.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/88875.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！