如何在Sklearn Pipeline中運行CatBoost

介紹

CatBoost的一大特點是可以很好的處理類別特征（Categorical Features）。當我們將其結合到Sklearn的Pipeline中時，會發生如下報錯：

_catboost.CatBoostError: 'data' is numpy array of floating point numerical type, it means no categorical features, but 'cat_features' parameter specifies nonzero number of categorical features

因為CatBoost需要檢查輸入訓練數據pandas.DataFrame中對應的cat_features。如果我們使用Pipeline后，輸入給.fit()的數據是被修改過的，DataFrame中的columns的名字變為了數字。

解決方案

我們提前在數據上使用Pipeline，然后將原始數據轉換為Pipeline處理后的數據，然后檢索出其中包含的類別特征，將其傳輸給Catboost。

# define your pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor),('classifier', model),
])preprocessor.fit(X_train)
transformed_X_train = pd.DataFrame(preprocessor.transform(X_train)).convert_dtypes()new_cat_feature_idx = [transformed_X_train.columns.get_loc(col) for col in transformed_X_train.select_dtypes(include=['int64', 'bool']).columns]pipeline.fit(X_train, y_train, classifier__cat_features=new_cat_feature_idx)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/37513.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/37513.shtml
英文地址，請注明出處：http://en.pswp.cn/web/37513.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！