天池機器學習算法（一）: 基于邏輯回歸的分類預測

pytorch實戰

課時7 神經網絡

MSE的缺點：偏導值在輸出概率值接近0或者接近1的時候非常小，這可能會造成模型剛開始訓練時，偏導值幾乎消失，模型速度非常慢。
交叉熵損失函數：平方損失則過于嚴格，需要使用更合適衡量兩個概率分布差異的測量函數。
使用邏輯函數得到概率，并結合交叉熵當損失函數時，在模型效果差的時候學習速度比較快，在模型效果好的時候學習速度變慢。
torch.randint(0,2,(10,))
報錯：torch.randint(0,2,(10))必須要有逗號
x.view()相當于reshape。x.view((-1, 4))當第一個參數為-1時，自動調整為n行4列的張量
寫模型時需要注意：
- super(LinearNet,self).init()
- forward(self, X):
查看模型參數：net.state_dict()

機器學習算法（一）: 基于邏輯回歸的分類預測

天池學習地址

邏輯回歸使用交叉熵作為損失函數，我理解的步驟為：

初始化w和b，計算所有點的y值。
利用sigmoid函數將y值轉化為屬于某一類的概率
利用交叉熵損失，希望損失最小，不斷更新w和b

下面是天池的具體內容：

# 可視化決策邊界
plt.figure()
plt.scatter(x_fearures[:,0],x_fearures[:,1], c=y_label, s=50, cmap='viridis') # 繪制三點圖
plt.title('Dataset')# x割裂成200份，y為100，生成網格矩陣，存儲網格矩陣的點，20000個點。畫圖的時候，不需要一定按照x和y的坐標，使用網格坐標也可
nx, ny = 200, 100
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim() # 邊界的大小
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx),np.linspace(y_min, y_max, ny)) #x_grid, y_grid的大小都是100*200，計數是從左下到右上# 根據網格矩陣，也就是有20000個點，計算每個點分別為1類和2類的概率，z_proba的結果
''' array([[0.98401648, 0.01598352],[0.98362875, 0.01637125],[0.98323179, 0.01676821],...,[0.01094403, 0.98905597],[0.01068344, 0.98931656],[0.01042899, 0.98957101]]) '''
z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()]) # ravel()將二維合成一維
z_proba = z_proba[:, 1].reshape(x_grid.shape) # 此時z_proba是對應的類別1的預測概率
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue') # 繪制等高線的函數，例如畫一座山。XY的坐標，和山的高度plt.show()

下面是分析iris數據集的一般步驟：

數據集的讀取，轉化為pandas元素

iris_target = data.target #得到數據對應的標簽
iris_features = pd.DataFrame(data=data.data, columns=data.feature_names) #利用Pandas轉化為DataFrame格式

查看數據集的基本信息：

# 這些函數是pandas的，所以數據格式為Series和DataFrame
## 利用.info()查看數據的整體信息
iris_features.info()
## 進行簡單的數據查看，我們可以利用 .head() 頭部.tail()尾部
iris_features.head()
iris_features.tail()
## 其對應的類別標簽為，其中0，1，2分別代表'setosa', 'versicolor', 'virginica'三種不同花的類別。
iris_target
## 利用value_counts函數查看每個類別數量
pd.Series(iris_target).value_counts()
## 對于特征進行一些統計描述
iris_features.describe()

可視化描述：散點和箱線圖

## 特征與標簽組合的散點可視化
sns.pairplot(data=iris_all,diag_kind='hist', hue= 'target')
plt.show()
## 箱線圖
for col in iris_features.columns:sns.boxplot(x='target', y=col, saturation=0.5,palette='pastel', data=iris_all)plt.title(col)plt.show()

利用模型進行訓練：劃分數據集，定義模型，模型訓練，打印參數

## 劃分數據集
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(iris_features_part, iris_target_part, test_size = 0.2, random_state = 2020)
## 模型訓練
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state=0, solver='lbfgs')
clf.fit(x_train, y_train)
clf.coef_
clf.intercept_

利用模型進行測試，可視化測試結果：預測結果和概率，計算混淆矩陣，利用矩陣和熱力圖可視化

## 測試結果是一個array，類別和概率分別如下
test_predict = clf.predict(x_test)
test_predict_proba = clf.predict_proba(x_test)
## 正確率計算
from sklearn import metrics
print('The accuracy of the Logistic Regression is:',metrics.accuracy_score(y_test,test_predict))
## 查看混淆矩陣
confusion_matrix_result = metrics.confusion_matrix(test_predict,y_test)
# 利用熱力圖對于結果進行可視化
plt.figure(figsize=(8, 6))
sns.heatmap(confusion_matrix_result, annot=True, cmap='Blues')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.show()

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/162356.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/162356.shtml
英文地址，請注明出處：http://en.pswp.cn/news/162356.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！