MlLib--邏輯回歸筆記

批量梯度下降的邏輯回歸可以參考這篇文章:http://blog.csdn.net/pakko/article/details/37878837

看了一些Scala語法后,打算看看MlLib的機器學習算法的并行化,那就是邏輯回歸,找到package org.apache.spark.mllib.classification下的LogisticRegressionWithSGD這個類,直接搜train()函數。

  def train(input: RDD[LabeledPoint],numIterations: Int,stepSize: Double,miniBatchFraction: Double,initialWeights: Vector): LogisticRegressionModel = {new LogisticRegressionWithSGD(stepSize, numIterations, 0.0, miniBatchFraction).run(input, initialWeights)}

發現它調用了GeneralizedLinearAlgorithm下的一個run函數,這個類GeneralizedLinearAlgorithm是個抽象類,并且在GeneralizedLinearAlgorithm.scala文件下,并且類LogisticRegressionWithSGD是繼承了GeneralizedLinearAlgorithm

  def run(input: RDD[LabeledPoint], initialWeights: Vector): M = {if (numFeatures < 0) {numFeatures = input.map(_.features.size).first()}if (input.getStorageLevel == StorageLevel.NONE) {logWarning("The input data is not directly cached, which may hurt performance if its"+ " parent RDDs are also uncached.")}// Check the data properties before running the optimizerif (validateData && !validators.forall(func => func(input))) {throw new SparkException("Input validation failed.")}/*** Scaling columns to unit variance as a heuristic to reduce the condition number:** During the optimization process, the convergence (rate) depends on the condition number of* the training dataset. Scaling the variables often reduces this condition number* heuristically, thus improving the convergence rate. Without reducing the condition number,* some training datasets mixing the columns with different scales may not be able to converge.** GLMNET and LIBSVM packages perform the scaling to reduce the condition number, and return* the weights in the original scale.* See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf** Here, if useFeatureScaling is enabled, we will standardize the training features by dividing* the variance of each column (without subtracting the mean), and train the model in the* scaled space. Then we transform the coefficients from the scaled space to the original scale* as GLMNET and LIBSVM do.** Currently, it's only enabled in LogisticRegressionWithLBFGS*/val scaler = if (useFeatureScaling) {new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))} else {null}// Prepend an extra variable consisting of all 1.0's for the intercept.// TODO: Apply feature scaling to the weight vector instead of input data.val data =if (addIntercept) {if (useFeatureScaling) {input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()} else {input.map(lp => (lp.label, appendBias(lp.features))).cache()}} else {if (useFeatureScaling) {input.map(lp => (lp.label, scaler.transform(lp.features))).cache()} else {input.map(lp => (lp.label, lp.features))}}/*** TODO: For better convergence, in logistic regression, the intercepts should be computed* from the prior probability distribution of the outcomes; for linear regression,* the intercept should be set as the average of response.*/val initialWeightsWithIntercept = if (addIntercept && numOfLinearPredictor == 1) {appendBias(initialWeights)} else {/** If `numOfLinearPredictor > 1`, initialWeights already contains intercepts. */initialWeights}val weightsWithIntercept = optimizer.optimize(data, initialWeightsWithIntercept) //這里進入優化val intercept = if (addIntercept && numOfLinearPredictor == 1) {weightsWithIntercept(weightsWithIntercept.size - 1)} else {0.0}var weights = if (addIntercept && numOfLinearPredictor == 1) {Vectors.dense(weightsWithIntercept.toArray.slice(0, weightsWithIntercept.size - 1))} else {weightsWithIntercept}/*** The weights and intercept are trained in the scaled space; we're converting them back to* the original scale.** Math shows that if we only perform standardization without subtracting means, the intercept* will not be changed. w_i = w_i' / v_i where w_i' is the coefficient in the scaled space, w_i* is the coefficient in the original space, and v_i is the variance of the column i.*/if (useFeatureScaling) {if (numOfLinearPredictor == 1) {weights = scaler.transform(weights)} else {/*** For `numOfLinearPredictor > 1`, we have to transform the weights back to the original* scale for each set of linear predictor. Note that the intercepts have to be explicitly* excluded when `addIntercept == true` since the intercepts are part of weights now.*/var i = 0val n = weights.size / numOfLinearPredictorval weightsArray = weights.toArraywhile (i < numOfLinearPredictor) {val start = i * nval end = (i + 1) * n - { if (addIntercept) 1 else 0 }val partialWeightsArray = scaler.transform(Vectors.dense(weightsArray.slice(start, end))).toArraySystem.arraycopy(partialWeightsArray, 0, weightsArray, start, partialWeightsArray.size)i += 1}weights = Vectors.dense(weightsArray)}}// Warn at the end of the run as well, for increased visibility.if (input.getStorageLevel == StorageLevel.NONE) {logWarning("The input data was not directly cached, which may hurt performance if its"+ " parent RDDs are also uncached.")}// Unpersist cached dataif (data.getStorageLevel != StorageLevel.NONE) {data.unpersist(false)}createModel(weights, intercept)}

?在上面代碼中的optimizer.optimize,傳入了數據data和初始化的theta,然后optimizer在LogisticRegressionWithSGD中被初始化為:

class LogisticRegressionWithSGD private[mllib] (private var stepSize: Double,private var numIterations: Int,private var regParam: Double,private var miniBatchFraction: Double)extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {private val gradient = new LogisticGradient()private val updater = new SquaredL2Updater()@Since("0.8.0")override val optimizer = new GradientDescent(gradient, updater).setStepSize(stepSize).setNumIterations(numIterations).setRegParam(regParam).setMiniBatchFraction(miniBatchFraction)override protected val validators = List(DataValidators.binaryLabelValidator)/*** Construct a LogisticRegression object with default parameters: {stepSize: 1.0,* numIterations: 100, regParm: 0.01, miniBatchFraction: 1.0}.*/@Since("0.8.0")def this() = this(1.0, 100, 0.01, 1.0)override protected[mllib] def createModel(weights: Vector, intercept: Double) = {new LogisticRegressionModel(weights, intercept)}
}

?optimizer被賦值為GradientDescent(gradient, updater),然后又看GradientDescent這個類:

class GradientDescent private[spark] (private var gradient: Gradient, private var updater: Updater)extends Optimizer with Logging {private var stepSize: Double = 1.0private var numIterations: Int = 100private var regParam: Double = 0.0private var miniBatchFraction: Double = 1.0private var convergenceTol: Double = 0.001...@DeveloperApidef optimize(data: RDD[(Double, Vector)], initialWeights: Vector): Vector = {val (weights, _) = GradientDescent.runMiniBatchSGD(data,gradient,updater,stepSize,numIterations,regParam,miniBatchFraction,initialWeights,convergenceTol)weights}
}

?發現調用的是隨機梯度下降的miniBatch方法,runMiniBatchSGD:

  def runMiniBatchSGD(data: RDD[(Double, Vector)],gradient: Gradient,updater: Updater,stepSize: Double,numIterations: Int,regParam: Double,miniBatchFraction: Double,initialWeights: Vector,convergenceTol: Double): (Vector, Array[Double]) = {// convergenceTol should be set with non minibatch settingsif (miniBatchFraction < 1.0 && convergenceTol > 0.0) {logWarning("Testing against a convergenceTol when using miniBatchFraction " +"< 1.0 can be unstable because of the stochasticity in sampling.")}val stochasticLossHistory = new ArrayBuffer[Double](numIterations)// Record previous weight and current one to calculate solution vector differencevar previousWeights: Option[Vector] = Nonevar currentWeights: Option[Vector] = Noneval numExamples = data.count()// if no data, return initial weights to avoid NaNsif (numExamples == 0) {logWarning("GradientDescent.runMiniBatchSGD returning initial weights, no data found")return (initialWeights, stochasticLossHistory.toArray)}if (numExamples * miniBatchFraction < 1) {logWarning("The miniBatchFraction is too small")}// Initialize weights as a column vectorvar weights = Vectors.dense(initialWeights.toArray)val n = weights.size/*** For the first iteration, the regVal will be initialized as sum of weight squares* if it's L2 updater; for L1 updater, the same logic is followed.*/var regVal = updater.compute(weights, Vectors.zeros(weights.size), 0, 1, regParam)._2 //計算正則化的值var converged = false // indicates whether converged based on convergenceTolvar i = 1while (!converged && i <= numIterations) { //迭代開始,在小于最大迭代數的時候不斷運行val bcWeights = data.context.broadcast(weights)// Sample a subset (fraction miniBatchFraction) of the total data// compute and sum up the subgradients on this subset (this is one map-reduce)val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i).treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(seqOp = (c, v) => {// c: (grad, loss, count), v: (label, features)val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) //計算一個batch中每條數據的梯度(c._1, c._2 + l, c._3 + 1)},combOp = (c1, c2) => {// c: (grad, loss, count)(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) //將batch中所有數據的梯度相加,損失函數值相加,記錄batch的size})if (miniBatchSize > 0) {/*** lossSum is computed using the weights from the previous iteration* and regVal is the regularization value computed in the previous iteration as well.*/stochasticLossHistory.append(lossSum / miniBatchSize + regVal) //原來損失函數是這樣計算batch的總損失值除以batchSize再加上正則化值val update = updater.compute(weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), //更新權重和下次的正則化值stepSize, i, regParam)weights = update._1regVal = update._2previousWeights = currentWeightscurrentWeights = Some(weights)if (previousWeights != None && currentWeights != None) {converged = isConverged(previousWeights.get,currentWeights.get, convergenceTol)}} else {logWarning(s"Iteration ($i/$numIterations). The size of sampled batch is zero")}i += 1}logInfo("GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses %s".format(stochasticLossHistory.takeRight(10).mkString(", ")))(weights, stochasticLossHistory.toArray)}

?發現要對Batch中每一條數據計算梯度,調用的是gradient.compute函數,對于二值分類:

  override def compute(data: Vector,label: Double,weights: Vector,cumGradient: Vector): Double = {val dataSize = data.size// (weights.size / dataSize + 1) is number of classesrequire(weights.size % dataSize == 0 && numClasses == weights.size / dataSize + 1)numClasses match {case 2 =>/*** For Binary Logistic Regression.** Although the loss and gradient calculation for multinomial one is more generalized,* and multinomial one can also be used in binary case, we still implement a specialized* binary version for performance reason.*/val margin = -1.0 * dot(data, weights)val multiplier = (1.0 / (1.0 + math.exp(margin))) - labelaxpy(multiplier, data, cumGradient) //梯度的計算就是multiplier * data即,(h(x) - y)*xif (label > 0) {// The following is equivalent to log(1 + exp(margin)) but more numerically stable.MLUtils.log1pExp(margin) //返回損失函數值} else {MLUtils.log1pExp(margin) - margin}... //下面有多分類,還沒看
}

?利用treeAggregate并行化batch所有數據后,得到gradientSum要除以miniBatchSize,然后進入updater.compute進行權重theta和正則化值的更新,為了下一次迭代:

@DeveloperApi
class SquaredL2Updater extends Updater {override def compute(weightsOld: Vector,gradient: Vector,stepSize: Double,iter: Int,regParam: Double): (Vector, Double) = {// add up both updates from the gradient of the loss (= step) as well as// the gradient of the regularizer (= regParam * weightsOld)// w' = w - thisIterStepSize * (gradient + regParam * w)// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient //這個就是權重更新的迭代式子,這個是L2正則化后的更新,神奇的是(1 - thisIterStepSize * regParam)val thisIterStepSize = stepSize / math.sqrt(iter)                           //記得更新式子不是w‘ = w - alpha*gradient alpha就是學習率也就是thisIterStepSizeval brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector              //你會發現alpha = thisIterStepSize = 1/sqrt(iter)也就是隨著迭代次數越多學習率越低,邁出的步伐越小brzWeights :*= (1.0 - thisIterStepSize * regParam)brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)val norm = brzNorm(brzWeights, 2.0)(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)              //正則化值就是w'的二范數的平方乘以正則化參數regParam乘以0.5}
}

?

轉載于:https://www.cnblogs.com/Key-Ky/p/5246093.html

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/396708.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/396708.shtml
英文地址,請注明出處:http://en.pswp.cn/news/396708.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

mysql相關命令操作

2019獨角獸企業重金招聘Python工程師標準>>> 遠程連接容器中的mysql&#xff1a;mysql -h 192.168.5.116 -P 3306 -u root -p123456 啟動mysql容器&#xff1a; $ sudo docker pull mysql:5.6.35 $ sudo docker run --name mysql -p 12345:3306 -e MYSQL_ROOT_PASSW…

html實體注冊商標,html 注冊商標,html 注冊商標代碼

html中注冊的頁面用什么標簽寫好對于html中的注冊頁面&#xff0c;策朋專業辦理商標注冊、專利申請、版權登記保護&#xff0c;需要一個表格。使用標簽&#xff0c;輸入和按鈕標簽來組合成就。使用html作為注冊頁面。實際上&#xff0c;只要您能達到期望的效果&#xff0c;它的…

java已知一個二叉樹_#二叉樹復習#

#二叉樹復習#目錄滿二叉樹完全二叉樹平衡二叉樹二叉樹的主要性質--二叉樹的度--二叉樹的深度計算二叉樹的遍歷其他符號變量結點總數深度度為0的結點數/葉子結點數度為1的結點數度為2的結點數什么是滿二叉樹&#xff1f;二叉樹每層的結點數為。滿二叉樹總結點數&#xff1a;。圖…

hashtable - hashmap

http://www.importnew.com/24822.html轉載于:https://www.cnblogs.com/qinqiu/p/9711147.html

java 反射機制_基礎篇:深入解析JAVA反射機制

反射的概念java 的放射機制&#xff1a;在程序運行時&#xff0c;程序有能力獲取一個類的所有方法和屬性&#xff1b;并且對于任意一個對象&#xff0c;可以調用它的任意方法或者獲取其屬性通俗解析&#xff1a;java 文件需要編譯成. class 文件才能被 jvm 加載使用, 對象的. c…

構建之法閱讀筆記01

本學期閱讀計劃有兩個&#xff0c;一個是《構建之法》&#xff0c;另一個是《大道至簡》。 在快速閱讀構建之法后&#xff0c;我想提一下幾個問題&#xff1a; 1、軟件程序軟件工程&#xff0c;那么只會軟件工程是怎樣具體詳細的將程序變成合格的軟件的&#xff1f; 2、效能分析…

html div float center,跨瀏覽器實現float:center

跨瀏覽器實現float:center互聯網 發布時間&#xff1a;2008-10-17 19:26:11 作者&#xff1a;佚名 我要評論原文&#xff1a;http://www.macji.com/blog/article/to-achieve-cross-browser-css-float-center/to-achieve-cross-browser-css-float-center/我們都知道float…

博弈論中:納什均衡、純策略納什均衡、混合策略納什均衡、占優策略

納什均衡 納什均衡是由約翰福布斯納什&#xff08;John Forbes Nash&#xff09;在20世紀50年代提出的博弈論概念&#xff0c;用于描述博弈中的一種穩定狀態。在納什均衡狀態下&#xff0c;每個參與者都假定其他參與者的策略是已知的&#xff0c;他們選擇的策略是最優的&#…

工具_HBuilder使用快捷方式

HBuilder常用快捷鍵大概共9類&#xff08;【4 13 3】文件、編輯、插入&#xff1b;【4 9 8】選擇、跳轉、查找&#xff1b;【1 1 6】運行、工具、視圖&#xff09; 1.文件(4) 新建 Ctrl N 關閉 Ctrl F4 全部關閉 Ctrl Shift F4 屬性 Alt Enter 2.編輯(13) 激活代碼助…

oracle左連接沒用_一周零基礎學完Oracle數據庫第三天02

四、 多表查詢1 什么是多表查詢多表查詢&#xff1a;當查詢的數據并不是來源一個表時&#xff0c;需要使用多表鏈接操作完成查詢。根據 不同表中的數據之間的關系查詢相關聯的數據。多表鏈接方式&#xff1a; 內連接&#xff1a;連接兩個表&#xff0c;通過相等或不等判斷鏈接列…

weblogic啟動項目報錯找不到類_啟動類報錯是經常出現的事但是單一的從一個地方找原因會越找越錯...

Error starting ApplicationContext. To display the conditions report rerun your application with debug enabled.當我們看到這個報錯的時候有的說是jar包重復&#xff0c;有的說是Controller包和Application包處于平行位置&#xff0c;還有的覺得是RequestMapping的valu…

fis

fis3實時刷新 npm install -g fis3 進入相關目錄 發布&#xff1a; fis3 release 啟動&#xff1a; fis3 server start // 服務啟動后&#xff0c;會一直存在&#xff0c;重啟或者fis3 server stop 才會關閉服務 自動刷新 fis3 release -wL關閉服務 fis3 server stop …

深入理解javascript原型和閉包(7)——原型的靈活性

在Java和C#中&#xff0c;你可以簡單的理解class是一個模子&#xff0c;對象就是被這個模子壓出來的一批一批月餅&#xff08;中秋節剛過完&#xff09;。壓個啥樣&#xff0c;就得是個啥樣&#xff0c;不能隨便動&#xff0c;動一動就壞了。 而在javascript中&#xff0c;就沒…

微型計算機一般不采用的控制方式,微型計算機控制作業.doc

作業一PID控制器引言在實際的過程控制與運動控制系統中&#xff0c;PID家族占據有相當的地位&#xff0c;據統計&#xff0c;工業控制的控制器中PID類控制占有90%以上。PID控制器是最早出現的控制器類型&#xff0c;因為其結構簡單&#xff0c;各個控制器參數有著明顯的物理意義…

js根據毫米/厘米算像素px

<html><meta http-equiv"content-type" content"text/html;charsetutf-8"><body> 紙張寬度(毫米mm)&#xff1a;<input type"text" id"width" value"10"> <span id"width_px"><…

c語言為什么有這么多的編程環境?_為什么98%的程序員學編程都會從C語言開始?...

在互聯網蓬勃發展的時代&#xff0c;有一類人做出了巨大的貢獻&#xff0c;這一群人被大家稱之為程序員&#xff0c;怎樣才能成為一名優秀的程序員呢&#xff0c;為什么每一個程序員都需要學習C語言呢&#xff1f;就讓我來跟大家分享分享&#xff1a;壹第一&#xff1a;相比較其…

怎么把電腦上的python軟件卸載干凈_怎么把一個軟件卸載干凈_把一個軟件卸載干凈的兩種方法-系統城...

平時使用電腦肯定有卸載軟件的操作&#xff0c;一般人直接用戶桌面的快捷方式刪除&#xff0c;表示軟件已經卸載干凈了&#xff0c;因為在桌面已經看不見了。其實大部分都沒有卸載干凈&#xff0c;如果沒卸載干凈&#xff0c;下載就無法安裝了&#xff0c;因為之前還有殘留文件…

2.x最終照著教程,成功使用OpenGL ES 繪制紋理貼圖,添加了灰度圖

在之前成功繪制變色的幾何圖形之后&#xff0c;今天利用Openg ES的可編程管線繪制出第一張紋理。學校時候不知道OpenGL的重要性&#xff0c;怕晦澀的語法。沒有跟老師學習OpenGL的環境配置&#xff0c;現在僅僅能利用cocos2dx 2.2.3 配置好的環境學習OpenGL ES。源碼來自《coco…

C# Dapper 簡單實例

/// <summary>/// 分頁信息/// </summary>public class PageInfo<T>{/// <summary>/// 分頁信息/// </summary>public PageInfo(){}/// <summary>/// 總頁數/// </summary>public long TotalCount{get; set;}/// <summary>///…

Angular 星級評分組件

一、需求演變及描述&#xff1a; 1. 有一個“客戶對公司的總體評價”的字段&#xff08;evalutation&#xff09;。字段為枚舉類型&#xff0c;0-5&#xff0c;對應關系為&#xff1a;0-暫無評價&#xff0c;1-很差&#xff0c;2-差&#xff0c;3-一般&#xff0c;4-好&#xf…