Scikit-Learn機器學習入門

現在最常用的數據分析的編程語言為R和Python。每種語言都有自己的特點，Python因為Scikit-Learn庫贏得了優勢。Scikit-Learn有完整的文檔，并實現很多機器學習算法，而每種算法使用的接口幾乎相同，可以非常快的測試其它學習算法。

Pandas一般和Scikit-Learn配合使用，它是基于Numpy構建的含有更高級數據結構和工具的數據統計工具，可以把它當成excel。

加載數據

首先把數據加載到內存。下載UCI數據集：

import numpy as np

import urllib

# 數據集的url

url = "http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"

# 下載文件

raw_data = urllib.urlopen(url)

# 把數據加載到numpy matrix

dataset = np.loadtxt(raw_data, delimiter=",")

# 分離數據集

X = dataset[:,0:7]??# 屬性集

y = dataset[:,8]????# 標簽

數據標準化

在開始應用學習算法之前，應首先對數據執行標準化，這是為了確保特征值的范圍在0-1。對數據進行預處理：

from sklearn import preprocessing

# normalize

normalized_X = preprocessing.normalize(X)

# standardize

standardized_X = preprocessing.scale(X)

分類

ExtraTreesClassifier(基于樹)：

from sklearn import metrics

from sklearn.ensemble import ExtraTreesClassifier

model = ExtraTreesClassifier()

model.fit(X, y)

# 顯示屬性的相對重要性

print(model.feature_importances_)

LogisticRegression：

from sklearn.feature_selection import RFE

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

rfe = RFE(model, 3)

rfe = rfe.fit(X, y)

print(rfe.support_)

print(rfe.ranking_)

機器學習算法

Logistic?regression

通常用來解決分類問題（binary），但是也支持多個分類。這個算法會給出屬于某一分類的概率：

from sklearn import metrics

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X, y)

print(model)

# 做預測

expected = y

predicted = model.predict(X)

# 輸出

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

Screen Shot 2016-02-19 at 16.48.07

樸素貝葉斯-Naive Bayes

這也是廣為人知的機器學習算法，用來學習數據分布的密度，在多分類問題中可以提供高質量的預測結果。

from sklearn import metrics

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(X, y)

print(model)

# 預測

expected = y

predicted = model.predict(X)

# 結果

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

Screen Shot 2016-02-19 at 16.53.38

KNN算法(K－Nearest?Neighbours)

使用Python實現K-Nearest Neighbor算法

它通常用在更復雜分類算法的一部分，它在回歸問題中可以提供很好的結果。

Screen Shot 2016-02-19 at 16.58.05

決策樹－Decision Trees

能很好的處理回歸和分類問題。

from sklearn import metrics

from sklearn.tree import DecisionTreeClassifier

# fit a CART model to the data

model = DecisionTreeClassifier()

model.fit(X, y)

print(model)

# 預測

expected = y

predicted = model.predict(X)

# 結果

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

Screen Shot 2016-02-19 at 17.01.29

支持向量機－Support Vector Machines

使用Python實現Support Vector Machine算法

from sklearn import metrics

from sklearn.svm import SVC

# fit a SVM model to the data

model = SVC()

model.fit(X, y)

print(model)

# 預測

expected = y

predicted = model.predict(X)

# 結果

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

Screen Shot 2016-02-19 at 17.04.26

Scikit-Learn還提供了一堆更復雜的算法，包括clustering，Bagging?和?Boosting。

轉載于:https://www.cnblogs.com/gejuncheng/p/8127446.html

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/370091.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/370091.shtml
英文地址，請注明出處：http://en.pswp.cn/news/370091.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！