Python 數據分析與可視化 Day 14 - 建模復盤 + 多模型評估對比（邏輯回歸 vs 決策樹）

? 今日目標

回顧整個本周數據分析 & 建模流程
學會訓練第二種模型：決策樹（Decision Tree）
掌握多模型對比評估的方法與實踐
輸出綜合對比報告：準確率、精確率、召回率、F1 等指標
為后續模型調優與擴展打下基礎

🪜 一、本周流程快速回顧

步驟	內容
第1天	高級數據操作（索引、透視、變形）
第2天	缺失值和異常值處理
第3天	多表合并與連接
第4天	特征工程（編碼、歸一化、時間）
第5天	數據集拆分（訓練集 / 測試集）
第6天	邏輯回歸模型構建與評估
第7天	🤖 多模型對比評估（今天）

🌲 二、訓練決策樹分類器

from sklearn.tree import DecisionTreeClassifiertree = DecisionTreeClassifier(random_state=42)
tree.fit(X_train, y_train)
y_pred_tree = tree.predict(X_test)

?? 三、模型對比評估

from sklearn.metrics import classification_reportprint("📋 Logistic 回歸：")
print(classification_report(y_test, y_pred_log))print("📋 決策樹模型：")
print(classification_report(y_test, y_pred_tree))

📊 可視化對比（可選）

import matplotlib.pyplot as pltmodels = ["Logistic", "DecisionTree"]
accuracies = [accuracy_score(y_test, y_pred_log),accuracy_score(y_test, y_pred_tree),
]plt.bar(models, accuracies, color=["skyblue", "lightgreen"])
plt.title("模型準確率對比")
plt.ylabel("Accuracy")
plt.show()

🧪 今日練習建議（腳本名：`compare_models.py`）

讀取本周生成的訓練 / 測試數據
同時訓練邏輯回歸與決策樹模型
輸出各自的評估指標（Accuracy、Precision、Recall、F1）
（可選）將結果寫入一個 CSV 或圖表可視化

思考不同模型優劣，以及如何選擇合適模型

# compare_models.py
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (accuracy_score,classification_report,confusion_matrix
)
import matplotlib.pyplot as plt
import seaborn as sns
import osplt.rcParams['font.family'] = 'Arial Unicode MS'  # Mac 用戶可用
plt.rcParams['axes.unicode_minus'] = False# 1. 加載訓練與測試數據
data_dir = "data/model"
X_train = pd.read_csv(os.path.join(data_dir, "X_train.csv"))
X_test = pd.read_csv(os.path.join(data_dir, "X_test.csv"))
y_train = pd.read_csv(os.path.join(data_dir, "y_train.csv")).values.ravel()
y_test = pd.read_csv(os.path.join(data_dir, "y_test.csv")).values.ravel()# 2. 初始化模型
log_model = LogisticRegression()
tree_model = DecisionTreeClassifier(random_state=42)# 3. 模型訓練
log_model.fit(X_train, y_train)
tree_model.fit(X_train, y_train)# 4. 模型預測
y_pred_log = log_model.predict(X_test)
y_pred_tree = tree_model.predict(X_test)# 5. 評估結果
print("📋 Logistic 回歸評估報告：")
print(classification_report(y_test, y_pred_log))print("\n🌳 決策樹評估報告：")
print(classification_report(y_test, y_pred_tree))# 6. 準確率對比
acc_log = accuracy_score(y_test, y_pred_log)
acc_tree = accuracy_score(y_test, y_pred_tree)# 7. 可視化混淆矩陣
plt.figure(figsize=(10, 4))plt.subplot(1, 2, 1)
sns.heatmap(confusion_matrix(y_test, y_pred_log, labels=[0, 1]), annot=True, fmt="d", cmap="Blues",xticklabels=["0", "1"], yticklabels=["0", "1"])
plt.title("Logistic 回歸 - 混淆矩陣")
plt.xlabel("預測", fontproperties="Arial Unicode MS")
plt.ylabel("真實", fontproperties="Arial Unicode MS")plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix(y_test, y_pred_tree, labels=[0, 1]), annot=True, fmt="d", cmap="Greens",xticklabels=["0", "1"], yticklabels=["0", "1"])
plt.title("決策樹 - 混淆矩陣")
plt.xlabel("預測", fontproperties="Arial Unicode MS")
plt.ylabel("真實", fontproperties="Arial Unicode MS")plt.tight_layout()
plt.show()# 8. 準確率柱狀圖
plt.figure(figsize=(5, 4))
plt.bar(["Logistic", "Decision Tree"], [acc_log, acc_tree], color=["skyblue", "lightgreen"])
plt.title("模型準確率對比")
plt.ylabel("Accuracy")
plt.ylim(0, 1)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()# 9. 匯總結果（可選保存）
results_df = pd.DataFrame({"模型": ["Logistic", "Decision Tree"],"準確率": [acc_log, acc_tree]
})
os.makedirs("data/result", exist_ok=True)
results_df.to_csv("data/result/model_comparison.csv", index=False)
print("\n? 對比結果已保存：data/result/model_comparison.csv")

結果輸出：

📋 Logistic 回歸評估報告：precision    recall  f1-score   support0       1.00      1.00      1.00         71       1.00      1.00      1.00        13accuracy                           1.00        20macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20🌳 決策樹評估報告：precision    recall  f1-score   support0       1.00      1.00      1.00         71       1.00      1.00      1.00        13accuracy                           1.00        20macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20? 對比結果已保存：data/result/model_comparison.csv

可視化混淆矩陣:
在這里插入圖片描述

準確率柱狀圖:
在這里插入圖片描述

data/result/model_comparison.csv：
在這里插入圖片描述
PS：可以使用下面的代碼生成訓練/測試集：

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import os# 構造示例數據
np.random.seed(42)
size = 100
df = pd.DataFrame({"成績": np.random.randint(40, 100, size=size),"性別": np.random.choice(["男", "女"], size=size)
})# 增加派生特征
df["成績_標準化"] = (df["成績"] - df["成績"].mean()) / df["成績"].std()
df["是否及格_數值"] = (df["成績"] >= 60).astype(int)
df["性別_男"] = (df["性別"] == "男").astype(int)
df["性別_女"] = (df["性別"] == "女").astype(int)# 特征與標簽
X = df[["成績_標準化", "性別_男", "性別_女", "是否及格_數值"]]
y = df["是否及格_數值"]# 拆分數據
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 保存路徑
os.makedirs("data/model", exist_ok=True)
X_train.to_csv("data/model/X_train.csv", index=False)
X_test.to_csv("data/model/X_test.csv", index=False)
y_train.to_csv("data/model/y_train.csv", index=False)
y_test.to_csv("data/model/y_test.csv", index=False)

🧾 今日總結

理解模型評估不止準確率，更要看精確率與召回率
決策樹可捕捉非線性關系，但易過擬合
模型選擇應結合業務背景、樣本數量、可解釋性等因素

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/87245.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/87245.shtml
英文地址，請注明出處：http://en.pswp.cn/web/87245.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！