機器學習——邏輯回歸ROC練習

一、

題目要求：

給定以下二分類模型的預測結果，手動繪制ROC曲線并計算AUC值：

y_true = [0, 1, 0, 1, 0, 1] # 真實標簽（0=負類，1=正類）

y_score = [0.2, 0.7, 0.3, 0.6, 0.1, 0.8] # 模型預測得分

代碼展示：

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curvey_true = [0,1,0,1,0,1]
y_score = [0.2,0.7,0.3,0.6,0.1,0.8]fpr,tpr,_ = roc_curve(y_true,y_score)plt.figure(figsize=(8,9))
plt.plot(fpr,tpr,color="b")
plt.plot([0,1],[0,1],color="r",linestyle="--")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.grid()
plt.show()

結果展示：

?二、

題目要求

處理有相同預測得分的情況：

y_true = [1, 0, 0, 1, 1, 0, 1, 0]

y_score = [0.8, 0.5, 0.5, 0.7, 0.6, 0.5, 0.9, 0.3]

代碼展示：

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curvey_true = [1, 0, 0, 1, 1, 0, 1, 0]
y_score = [0.8, 0.5, 0.5, 0.7, 0.6, 0.5, 0.9, 0.3]fpr,tpr,_ = roc_curve(y_true,y_score)plt.figure(figsize=(8,9))
plt.plot(fpr,tpr,color="b")
plt.plot([0,1],[0,1],color="r",linestyle="--")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.grid()
plt.show()

結果展示：

三、

題目背景

在信用卡欺詐檢測中，正常交易（負類）遠多于欺詐交易（正類）。給定以下模擬數據：

y_true = [0, 0, 0, 0, 0, 0, 1, 0, 1, 0] # 2個欺詐（正類），8個正常（負類）

y_score = [0.1, 0.2, 0.15, 0.05, 0.3, 0.25, 0.9, 0.4, 0.6, 0.1] # 模型輸出的欺詐概率

題目要求

手動計算所有(FPR, TPR)點
繪制ROC曲線
觀察類別不平衡對曲線形狀的影響

代碼展示：

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curvey_true = [0, 0, 0, 0, 0, 0, 1, 0, 1, 0]  # 2個欺詐（正類），8個正常（負類）
y_score = [0.1, 0.2, 0.15, 0.05, 0.3, 0.25, 0.9, 0.4, 0.6, 0.1]  # 模型輸出的欺詐概率fpr,tpr,_ = roc_curve(y_true,y_score)plt.figure(figsize=(8,9))
plt.plot(fpr,tpr,color="b")
plt.plot([0,1],[0,1],color="r",linestyle="--")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.grid()
plt.show()

結果展示：

四、

使用Kaggle上的“Pima Indians Diabetes Database”數據集來進行邏輯回歸的練習。該數據集的鏈接如下：Pima Indians Diabetes Database

?練習場景描述?：

你是一家醫療機構的數據分析師，你的任務是分析Pima Indians Diabetes Database數據集，以預測患者是否患有糖尿病。數據集包含了一系列與患者健康相關的指標，如懷孕次數、葡萄糖濃度、血壓等。你需要使用邏輯回歸模型來訓練一個分類器，以根據這些指標預測患者是否患有糖尿病。

?具體步驟?：

使用pandas庫加載數據集，并進行初步的數據探索，了解數據集的字段和分布情況。
對數據進行預處理，包括處理缺失值、標準化或歸一化特征值等。
使用numpy庫進行特征選擇或特征工程，以提高模型的性能。
劃分訓練集和測試集，使用訓練集訓練邏輯回歸模型，并使用測試集評估模型的性能。
使用matplotlib庫繪制ROC曲線或混淆矩陣，以直觀展示模型的分類效果。
根據評估結果調整模型的參數，以提高模型的性能。

?題目?：

基于上述練習場景，請完成以下任務：

加載Pima Indians Diabetes Database數據集，并進行初步的數據探索。
對數據進行預處理，包括處理缺失值和標準化特征值。
使用邏輯回歸模型進行訓練，并評估模型的性能。
繪制ROC曲線，展示模型的分類效果。
根據你的理解，提出至少一個改進模型性能的方法，并嘗試實現。

?代碼展示：

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScalerdf = pd.read_csv("./data/diabetes.csv",encoding="utf-8")
print(df.head())
print(df.shape)df = df.dropna(axis=0)x = df.drop("Outcome",axis=1)
y = df["Outcome"]x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=22)transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)estimator = LogisticRegression()
estimator.fit(x_train,y_train)y_predict = estimator.predict(x_test)
print("預測值和真實的對比：",y_predict == y_test)ret = estimator.score(x_test,y_test)
print("準確率：",ret)y_pred_proba = estimator.predict_proba(x_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)plt.figure(figsize=(8,9))
plt.plot(fpr,tpr,color="b")
plt.plot([0,1],[0,1],color="r",linestyle="--")
plt.xlabel("fpr")
plt.ylabel("tpr")
plt.grid()
plt.show()param_grid = {'C': [0.1, 1, 10, 100],'penalty': ['l1', 'l2'],'solver': ['liblinear', 'saga']
}
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5)
grid_search.fit(x_train, y_train)best_estimator = grid_search.best_estimator_
best_ret = best_estimator.score(x_test, y_test)
print("調整參數后模型的準確率：", best_ret)

?結果展示：

預測值和真實的對比： 645    False
767     True
31      True
148    False
59      True...  
30      True
158     True
167     True
582    False
681     True
Name: Outcome, Length: 154, dtype: bool
準確率： 0.7402597402597403
調整參數后模型的準確率： 0.7402597402597403

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/80105.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/80105.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/80105.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！