機器學習——KNN超參數

sklearn.model_selection.GridSearchCV 是 scikit-learn 中用于超參數調優的核心工具，通過結合交叉驗證和網格搜索實現模型參數的自動化優化。以下是詳細介紹：

一、功能概述

GridSearchCV 在指定參數網格上窮舉所有可能的超參數組合，通過交叉驗證評估每組參數的性能，最終選擇最優參數組合。其核心價值在于：

自動化調參：替代手動參數調試，提升效率3。
交叉驗證支持：通過 K 折交叉驗證減少過擬合風險，評估結果更可靠。

二、核心參數說明

參數	類型	作用
`estimator`	估計器對象	需調參的模型（如?`SVC()`、`RandomForestClassifier()`）
`param_grid`	字典或列表	參數名稱（字符串）為鍵，候選參數值列表為值（如?`{'C': [1,10], 'kernel': ['linear','rbf']}`）
`scoring`	字符串/可調用對象	評估指標（如?`'accuracy'`、`'roc_auc'`），默認使用模型的?`score()`?方法1013
`cv`	int/交叉驗證生成器	交叉驗證折數（默認 5 折），或自定義數據劃分策略28
`n_jobs`	int	并行任務數（`-1`?表示使用所有 CPU 核）

三、主要屬性

調用?fit()?方法后可通過以下屬性獲取結果：

best_score_：交叉驗證中的最高得分。
best_params_：最優參數組合（如?{'C': 10, 'kernel': 'rbf'}）。
cv_results_：詳細結果字典，包含每組參數的平均得分、標準差等。

四、工作流程

數據劃分：原始數據分為訓練集和測試集，訓練集進一步通過 K 折交叉驗證劃分為子集。
參數組合生成：根據?param_grid?生成所有可能的超參數組合（如 2×2 網格生成 4 組參數。
交叉驗證評估：每組參數在 K 折數據上訓練并驗證，計算平均得分。
最優模型選擇：選擇平均得分最高的參數組合，最終在完整訓練集上訓練模型8。

五、代碼演示

1、手動調參（循環調參）

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris#加載數據集
iris = load_iris()
x = iris.data
y = iris.target#劃分數據集
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=233, stratify=y)#分類
from sklearn.neighbors import KNeighborsClassifier#設置默認參數
neigh = KNeighborsClassifier(n_neighbors=3,weights='distance',#'uniform',p = 2
)
#適配參數
neigh.fit(x_train, y_train)
#評估模型
neigh.score(x_test, y_test) #結果：0.9777777777777777#自動設參（主要遍歷每個參數，找出最佳結果的參數）
best_score = -1
best_n  = -1
best_weight = ''
best_p = -1for n in range(1, 20):for weight in ['uniform', 'distance']:for p in range(1, 7):neigh = KNeighborsClassifier(n_neighbors=n,weights=weight,p = p)neigh.fit(x_train, y_train)score = neigh.score(x_test, y_test)if score > best_score:best_score = scorebest_n = nbest_weight = weightbest_p = pprint("n_neighbors:", best_n)
print("weights:", best_weight)
print("p:", best_p)
print("score:", best_score)#結果：n_neighbors: 5
#weights: uniform
#p: 2
#score: 1.0

2、KNN-sklearn.model_selection.GridSearchCV調參

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris#加載數據集
iris=load_iris()
x=iris.data
y=iris.target#劃分數據集（7:3）
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=233, stratify=y)#設置參數范圍
params = {'n_neighbors': [n for n in range(1, 20)],'weights': ['uniform', 'distance'],'p': [p for p in range(1, 7)]
}#定義調參對象
grid = GridSearchCV(estimator=KNeighborsClassifier(),param_grid=params,n_jobs=-1
)#適配參數
grid.fit(x_train, y_train)#打印最佳參數
print(grid.best_params_)#輸出預測值
print(grid.best_estimator_.predict(x_test))#模型評估
print(grid.best_estimator_.score(x_test, y_test))

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/73030.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/73030.shtml
英文地址，請注明出處：http://en.pswp.cn/web/73030.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！