一、預測facebook簽到位置
代碼展示:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score# 1.獲取數據集
df = pd.read_csv("./data/train.csv", encoding="utf-8")
print(df.shape)
# 2.基本數據處理
# 2.1 縮?數據范圍
df_data = df.query('(x >= 2) & (x <= 3) & (y >= 2) & (y <= 3)').copy()
print(df_data.shape)
# 2.2 選擇時間特征
# 假設這里我們提取時間特征中的小時部分作為新特征
df_data['hour'] = pd.to_datetime(df_data['time'], unit='s').dt.hour
print(df_data.hour)# 2.3 去掉簽到較少的地?
# 假設我們去掉簽到次數小于10次的地方
place_counts = df_data['place_id'].value_counts()
less_visited_places = place_counts[place_counts < 10].index
df_data = df_data[~df_data['place_id'].isin(less_visited_places)]
print(df_data.shape)# 2.4 確定特征值和?標值
# 選擇x, y, hour作為特征值,place_id作為目標值
X = df_data[['x', 'y', 'hour']]
y = df_data['place_id']# 2.5 分割數據集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 3.特征工程 -- 特征預處理(標準化)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)# 4.機器學習 -- knn+cv
# 假設我們使用KNN算法,K值設為5
knn = KNeighborsClassifier(n_neighbors=5)
# 使用交叉驗證評估模型
cv_scores = cross_val_score(knn, X_train, y_train, cv=5)
print("交叉驗證得分:", cv_scores)
print("平均交叉驗證得分:", cv_scores.mean())# 在訓練集上訓練模型
knn.fit(X_train, y_train)# 5.模型評估
y_pred = knn.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("測試集上的準確率:", test_accuracy)
結果展示:?
交叉驗證得分: [0.38221957 0.38024076 0.38277611 0.37953993 0.38213712]
平均交叉驗證得分: 0.3813826936554397
測試集上的準確率: 0.38908035552330855
?二、葡萄酒質量預測
from sklearn.datasets import load_wine
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import joblib# 1.獲取數據集
df = load_wine()
# 2.基本數據處理
x_train, x_test, y_train, y_test = train_test_split(df.data, df.target, test_size=0.2, random_state=20)
# 3.特征工程
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)# 4.機器學習 - 正規方程
lr = LinearRegression()
lr.fit(x_train, y_train)joblib.dump(lr,"./test.pkl")lr = joblib.load("./test.pkl")# 5.模型預測與評估 - 正規方程
lr_predict = lr.predict(x_test)
lr_mse = mean_squared_error(y_test, lr_predict)
print("正規方程的均方誤差:", lr_mse)# 4.機器學習 - 梯度下降法
sgd = SGDRegressor()
sgd.fit(x_train, y_train)# 5.模型預測與評估 - 梯度下降法
sgd_predict = sgd.predict(x_test)
sgd_mse = mean_squared_error(y_test, sgd_predict)
print("梯度下降法的均方誤差:", sgd_mse)
結果展示:
正規方程的均方誤差: 0.06709703764885735
梯度下降法的均方誤差: 0.06637844373293354
?