接上篇

4.梯度下降算法

《斯坦福大學公開課：機器學習課程》吳恩達講解第二課時，是直接從梯度下降開始講解，最后采用向量和矩陣的方式推導了解析解，國內很多培訓視頻是先講解析解后講梯度下降，個人認為梯度下降算法更為重要，它是很多算法（邏輯回歸、神經網絡）都可以共用的，線性回歸僅僅是湊巧有全局最小值的解法，有解析解，其他大部分算法都需要遍歷數據尋找全局（其實本質上是局部最優）最優解的過程。
解析解直接向量運算存在的問題：（1）矩陣必須是滿秩的，如果不是，python也能模糊處理。（2）運算性能，矩陣擴大化后，量級增大對性能影響明顯。
梯度下降不僅限于線性回歸，非線性和神經網絡同樣適用。

在線性回歸中，θ通過α學習速率在每個J( θ) 減小的維度上進行一個不定式 θ參數的遞減，不斷更新 θ{i} 的值直到 J(θ) 收斂，達到最小值。類比于下山，每次嘗試找比這個位置低的位置，一步一步到達山底（即我們求的最小值，詳細請看參考資料8）

4.1批量梯度下降算法

老師常常教導我們梯度下降算法這么經典的算法，一定要自己動手推導公式，自己用程序來實現以下，我一直嗤之以鼻，線性回歸這么簡單的算法還需要自己敲代碼實現一下嗎？最近工作閑暇時，我自己用Python實現了下該算法，不敲不知道，一敲嚇一大跳，太多坑在里面了，整整坑在里面2天時間，檢查多遍代碼最后發現是數據問題，學習率α需要取非常非常小的值才能收斂，否則程序一直處于震蕩狀態無法找到近似最小值。說了那么多廢話，下面直接貼代碼，代碼里備注說明很完善，很明了，很清晰了，大家應該能看懂吧o(￣︶￣)o
在這里插入圖片描述

def linearRegBgd2(x,y,alpha=0.005,lamba=0.005,loop_max=1000):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''參考鏈接：https://blog.csdn.net/mango_badnot/article/details/52328740 已將原作者計算過程進行了大量修改，就相當于是我原創了吧，將原計算的三重for循環重寫為一重for循環，將運算修改為了矩陣運算原作者構造的X特征是按行存儲的，與常見數據不太一致，X中第一行為特征1，第二行為特征2此函數將使用正常數據，X特征按列存儲，每行是一條記錄批量梯度下降算法公式：theta=theta + alpha * sum( (y-y_hat) * x)'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):y_hat=np.dot(x,theta.T).reshape((-1,1))loss= (y_hat-y)  #此處是y_hat-y，對應的theta求解處增加了一個負號loss_n=np.array([]) # 為方便直接用矩陣實現sum( (y-y_hat) * x)，產生多列loss乘以loss后求和if n==1: # 判斷x矩陣有幾列，列為1的時候單獨處理loss_n=losselse:for i in range(1,n):# 判斷x矩陣有幾列，列為大于2的時候，產生n列loss#print(i)if i==1:loss_n = np.column_stack((loss, loss))elif i>1:loss_n = np.column_stack((loss_n, loss))theta_old=theta # 記錄迭代前的theta#tmp=alpha*(loss_n*x).sum(axis=0)theta=theta - (alpha*(x*loss_n)).sum(axis=0) #+lamba*theta   #使用矩陣求解theta，注意此處的負號，loss*x后按特征列求和作為theta，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break#print('x:',x,'y:',y,'loss:',loss)#print('y_hat:',y_hat.T)#print('times:',times,'theta:',theta)#print('')return theta

4.2隨機梯度下降算法

隨機梯度下降就是每一個樣本更新一次權值，超過樣本范圍就循環從第一個樣本再循環，同等數據量情況下（數據量不是特別巨大時）訓練速度與批量梯度下降要慢一些，但是適合在線學習，來一條數據迭代訓練一次。
在這里插入圖片描述

def linearRegSgd(x,y,alpha=0.005,lamba=0.005,loop_max=10000):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''隨機梯度下降算法公式：theta=theta + alpha * (y-y_hat) * xalpha=0.01lamba=0.005'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):for i  in range(0,len(x)): # 取其中一條數據進行梯度下降# i=0y_hat=np.dot(x[i],theta.T).reshape((-1,1))loss= (y_hat-y[i])  #此處是y_hat-y，對應的theta求解處增加了一個負號theta_old=theta # 記錄迭代前的thetatheta=theta - alpha*x[i]*loss[0] #+lamba*theta#求解theta，注意此處的負號，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break
#            print('x:',x,'y:',y,'loss:',loss)
#            print('y_hat:',y_hat.T)
#            print('times:',times,'theta:',theta)
#            print('')return theta

4.3 mini-batch梯度下降算法

如果不是每拿到一個樣本即更改梯度，而是用若干個樣本的平均梯度作為更新方向，這就是Mini-batch梯度下降算法。

def linearRegMgd(x,y,alpha=0.005,lamba=0.005,loop_max=1000,batch_size=9):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''mini-batch梯度下降算法公式：每次使用batch_size個數據進行計算for i=1 to m: theta=theta + alpha * (y-y_hat) * x'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):for i in range(0,int(len(x)/batch_size)+1):#print('i:',i,x[i*batch_size:(i+1)*batch_size])x_mini=x[i*batch_size:(i+1)*batch_size]y_mini=y[i*batch_size:(i+1)*batch_size]y_hat=np.dot(x_mini,theta.T).reshape((-1,1))loss= (y_hat-y_mini)  #此處是y_hat-y，對應的theta求解處增加了一個負號loss_n=np.array([]) # 為方便直接用矩陣實現sum( (y-y_hat) * x)，產生多列loss乘以loss后求和if n==1: # 判斷x矩陣有幾列，列為1的時候單獨處理loss_n=losselse:for i in range(1,n):# 判斷x矩陣有幾列，列為大于2的時候，產生n列loss#print(i)if i==1:loss_n = np.column_stack((loss, loss))elif i>1:loss_n = np.column_stack((loss_n, loss))theta_old=theta # 記錄迭代前的theta#tmp=alpha*(loss_n*x).sum(axis=0)theta=theta - (alpha*(x_mini*loss_n)).sum(axis=0) #+lamba*theta   #使用矩陣求解theta，注意此處的負號，loss*x后按特征列求和作為theta，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break#print('x:',x,'y:',y,'loss:',loss)#print('y_hat:',y_hat.T)#print('times:',times,'theta:',theta)#print('')return theta

以上梯度下降算法完整代碼如下

# -*- coding: utf-8 -*-
"""@Time    : 2018/10/22 17:23@Author  : hanzi5@Email   : **@163.com@File    : linear_regression_bgd.py@Software: PyCharm
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegressiondef load_dataset(n): # 原作者產生數據的函數noise = np.random.rand(n)X = [[x, 1.] for x in range(n)]y = [(0.5 * X[i][0]  + 1. + noise[i]) for i in range(n)]return np.array(X).T, np.array(y).T # 注意X，W，y的維數def linearRegLsq(x,y):# 最小二乘法直接求解thetaxtx = np.dot(x.T, x)if np.linalg.det(xtx) == 0.0: # 判斷xtx行列式是否等于0，奇異矩陣不能求逆print('Can not resolve the problem')returntheta_lsq = np.dot(np.dot(np.linalg.inv(np.dot(x.T, x)), x.T), y)return theta_lsqdef linearRegBgd1(x,y,alpha=0.005,lamba=0.005,loop_max=1000):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''已將原作者計算過程進行了大量修改，已將原作者計算過程進行了大量修改，就相當于是我原創了吧，將原計算的三重for循環重寫為一重for循環參考鏈接：https://blog.csdn.net/mango_badnot/article/details/52328740 原作者構造的X特征是按行存儲的，與常見數據不太一致，X中第一行為特征1，第二行為特征2批量梯度下降算法公式：theta=theta + alpha * sum( (y-y_hat) * x)2018-10-22 17:18 測試通過，此函數不做多的備注了，詳細備注放在了linearRegBgd2里'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):y_hat=np.dot(x.T,theta)loss= y-y_hatfor i in range(1,n):if i==1:loss_n = np.row_stack((loss, loss))elif i>1:loss_n = np.row_stack((loss_n, loss))theta_old=thetatheta=theta+(alpha*x.T*loss_n.T).sum(axis=0) #+lamba*thetaif (theta-theta_old).all()<0.001:break#print('x:',x,'y:',y,'loss:',loss)#print('times:',times,'theta:',theta)#print('')return thetadef linearRegBgd2(x,y,alpha=0.005,lamba=0.005,loop_max=1000):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''參考鏈接：https://blog.csdn.net/mango_badnot/article/details/52328740 已將原作者計算過程進行了大量修改，就相當于是我原創了吧，將原計算的三重for循環重寫為一重for循環，將運算修改為了矩陣運算原作者構造的X特征是按行存儲的，與常見數據不太一致，X中第一行為特征1，第二行為特征2此函數將使用正常數據，X特征按列存儲，每行是一條記錄批量梯度下降算法公式：theta=theta + alpha * sum( (y-y_hat) * x)'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):y_hat=np.dot(x,theta.T).reshape((-1,1))loss= (y_hat-y)  #此處是y_hat-y，對應的theta求解處增加了一個負號loss_n=np.array([]) # 為方便直接用矩陣實現sum( (y-y_hat) * x)，產生多列loss乘以loss后求和if n==1: # 判斷x矩陣有幾列，列為1的時候單獨處理loss_n=losselse:for i in range(1,n):# 判斷x矩陣有幾列，列為大于2的時候，產生n列loss#print(i)if i==1:loss_n = np.column_stack((loss, loss))elif i>1:loss_n = np.column_stack((loss_n, loss))theta_old=theta # 記錄迭代前的theta#tmp=alpha*(loss_n*x).sum(axis=0)theta=theta - (alpha*(x*loss_n)).sum(axis=0) #+lamba*theta   #使用矩陣求解theta，注意此處的負號，loss*x后按特征列求和作為theta，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break#print('x:',x,'y:',y,'loss:',loss)#print('y_hat:',y_hat.T)#print('times:',times,'theta:',theta)#print('')return thetadef linearRegSgd(x,y,alpha=0.005,lamba=0.005,loop_max=10000):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''隨機梯度下降算法公式：for i=1 to m: theta=theta + alpha * (y-y_hat) * xalpha=0.01lamba=0.005'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):for i  in range(0,len(x)): # 取其中一條數據進行梯度下降# i=0y_hat=np.dot(x[i],theta.T).reshape((-1,1))loss= (y_hat-y[i])  #此處是y_hat-y，對應的theta求解處增加了一個負號theta_old=theta # 記錄迭代前的thetatheta=theta - alpha*x[i]*loss[0] #+lamba*theta#求解theta，注意此處的負號，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break
#            print('x:',x,'y:',y,'loss:',loss)
#            print('y_hat:',y_hat.T)
#            print('times:',times,'theta:',theta)
#            print('')return thetadef linearRegMgd(x,y,alpha=0.005,lamba=0.005,loop_max=1000,batch_size=9):#alpha = 0.0000001      #學習率步長，learning rate#lamba=0.005     #懲罰系數 '''mini-batch梯度下降算法公式：每次使用batch_size個數據進行計算for i=1 to m: theta=theta + alpha * (y-y_hat) * x'''np.set_printoptions(precision=4) # 設置numpy輸出4位小數n=len(x[0]) # 取第一行數據，查看數據列數theta=np.zeros(n)      # 初始化theta數組，默認全為0for times in range(loop_max):for i in range(0,int(len(x)/batch_size)+1):#print('i:',i,x[i*batch_size:(i+1)*batch_size])x_mini=x[i*batch_size:(i+1)*batch_size]y_mini=y[i*batch_size:(i+1)*batch_size]y_hat=np.dot(x_mini,theta.T).reshape((-1,1))loss= (y_hat-y_mini)  #此處是y_hat-y，對應的theta求解處增加了一個負號loss_n=np.array([]) # 為方便直接用矩陣實現sum( (y-y_hat) * x)，產生多列loss乘以loss后求和if n==1: # 判斷x矩陣有幾列，列為1的時候單獨處理loss_n=losselse:for i in range(1,n):# 判斷x矩陣有幾列，列為大于2的時候，產生n列loss#print(i)if i==1:loss_n = np.column_stack((loss, loss))elif i>1:loss_n = np.column_stack((loss_n, loss))theta_old=theta # 記錄迭代前的theta#tmp=alpha*(loss_n*x).sum(axis=0)theta=theta - (alpha*(x_mini*loss_n)).sum(axis=0) #+lamba*theta   #使用矩陣求解theta，注意此處的負號，loss*x后按特征列求和作為theta，注意上方本來想實現懲罰項的，但沒有想明白怎么實現if (theta-theta_old).all()<0.001: # 判斷前后兩次theta變換情況，當變化率很小時跳出循環break#print('x:',x,'y:',y,'loss:',loss)#print('y_hat:',y_hat.T)#print('times:',times,'theta:',theta)#print('')return thetaif __name__ == "__main__":path = 'D:/python_data/8.Advertising.csv'data = pd.read_csv(path)  # TV、Radio、Newspaper、Sales#data_matrix = data.as_matrix(columns=['TV', 'Radio', 'Newspaper', 'Sales'])  # 轉換數據類型data_array = data.values[:,1:]  # 轉換數據類型，去除第一列序號列x = data_array[:, :-1] #去除y對應列的數據，其他列作為xy = data_array[:, -1].reshape((-1,1))# 0、繪制圖像，查看數據分布情況plt.plot(data['TV'], y, 'ro', label='TV')plt.plot(data['Radio'], y, 'g^', label='Radio')plt.plot(data['Newspaper'], y, 'mv', label='Newspaer')plt.legend(loc='lower right')plt.grid()plt.show()# 1、使用sklearn LinearRegression包求解θlinreg = LinearRegression()model = linreg.fit(x, y)print('')print('1、sklearn LinearRegression包求解θ:','coef:', linreg.coef_[0],',intercept:', linreg.intercept_)# 2、最小二乘法，直接求解析解θtheta_lsq = linearRegLsq(x,y)print('')print('2、最小二乘法，theta解析解：',theta_lsq.reshape(1,3)[0])# 3、批量梯度下降求解theta# 注意下面兩個函數alpha都是非常小的值，取過大的值時，不收斂#x1, y1 = load_dataset(10)#theta1=linearRegBgd1(x1, y1)theta1=linearRegBgd1(x.T, y.T,alpha = 0.0000001)print('')print('3.1、批量梯度下降，linearRegBgd1函數，theta：',theta1)theta2=linearRegBgd2(x, y,alpha = 0.0000001)print('')print('3.2、批量梯度下降，linearRegBgd2函數，theta：',theta2)theta3=linearRegSgd(x, y,alpha = 0.000001,loop_max=10000)print('')print('3.3、隨機梯度下降，linearRegSgd函數，theta：',theta3)theta4=linearRegMgd(x, y,alpha = 0.000001,loop_max=10000,batch_size=11)print('')print('3.4、mini-batch梯度下降，linearRegMgd函數，theta：',theta4)

程序運行計算結果：
在這里插入圖片描述

數據見以下鏈接，復制到txt文檔中，另存為CSV格式即可。
數據

參考資料：
1、python實現線性回歸
2、機器學習：線性回歸與Python代碼實現
3、python實現簡單線性回歸
4、Machine-Learning-With-Python
5、《機器學習》西瓜書，周志華
6、機器學習視頻，鄒博
7、斯坦福大學公開課：機器學習課程
8、馬同學高等數學如何直觀形象的理解方向導數與梯度以及它們之間的關系？
9、【機器學習】線性回歸的梯度下降法

轉載于:https://www.cnblogs.com/hanzi5/p/10105622.html