基于Kmeans,對鳶尾花數據集前兩個特征進行聚類分析
通過迭代優化,將150個樣本劃分到K個簇中。
目標函數:最小化所有樣本到其所屬簇中心的距離平方和。
算法步驟:
隨機初始化K個簇中心。
將每個樣本分配到最近的中心。
計算均值確定每個簇的中心(均值)。
重復第2和3步直到穩定收斂。
程序代碼:
import mathimport numpy as np
from matplotlib import pyplot as plt
from sklearn import datasetsplt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = Falsedata = datasets.load_iris().data
labels = datasets.load_iris().target
print('數據維度',data.shape)
features = data[:,: 2]
print('特征',features)num_clusters = 6
epoch = 150
J_sum = []def J_calculate(features,divide_re,center):J = 0for s1 in range(150):distances = ((features[s1][0]-center[divide_re[s1]][0]) ** 2) + ((features[s1][1]-center[divide_re[s1]][1]) ** 2)#print(distances)J = J + distancesreturn Jdef decision(features,divide_re,center,epoch):J_best = []for _ in range(epoch):J_b = math.inffor s1 in range(150):best = Nonemin_J_now = math.inffor s2 in range(len(center)):divide_re[s1] = s2J_now = J_calculate(features,divide_re,center)if J_now < min_J_now:min_J_now = J_nowbest = s2divide_re[s1] = bestfor i in range(len(center)):xc = []yc = []for j in range(150):if (divide_re[j] == i):xc.append(features[j][0])yc.append(features[j][1])center[i] = [np.mean(xc), np.mean(yc)]if(min_J_now<J_b):J_b = min_J_nowJ_best.append(J_b)return features,divide_re,center,J_bestfor i in range(2,num_clusters+1):print(f'\n分{i}類:\n')center = features[np.random.choice(features.shape[0], i, replace=False)]print("初始中心點", center)distances = np.linalg.norm(features[:, np.newaxis, :] - center, axis=2)divide = np.argmin(distances,axis=1)divide_re = []for x in range(150):divide_re.append(divide[x])print("初始樣本分類", divide_re)features,divide_re,center,J_best = decision(features,divide_re,center,epoch)print(f'{i}類最佳J值為:',J_best[epoch-1])J_sum.append(J_best[epoch-1])plt.scatter(features[:, 0], features[:, 1], c=divide_re, cmap='viridis', edgecolors='k')plt.scatter(center[:, 0], center[:, 1], marker='x', s=30, linewidths=3, color='red')plt.title(f'{i}類C均值分類法結果')plt.xlabel('第一特征')plt.ylabel('第二特征')plt.show()
plt.figure()
plt.plot(range(2, num_clusters + 1), J_sum, marker='o')
plt.title('J與類別數量關系曲線')
plt.xlabel('類別數量')
plt.ylabel('J_sum 值')
plt.show()
運行結果:
數據維度 (150, 4)
特征 [[5.1 3.5]
[4.9 3. ]
[4.7 3.2]
[4.6 3.1]
[5. ?3.6]
[5.4 3.9]
[4.6 3.4]
[5. ?3.4]
[4.4 2.9]
[4.9 3.1]
[5.4 3.7]
[4.8 3.4]
[4.8 3. ]
[4.3 3. ]
[5.8 4. ]
[5.7 4.4]
[5.4 3.9]
[5.1 3.5]
[5.7 3.8]
[5.1 3.8]
[5.4 3.4]
[5.1 3.7]
[4.6 3.6]
[5.1 3.3]
[4.8 3.4]
[5. ?3. ]
[5. ?3.4]
[5.2 3.5]
[5.2 3.4]
[4.7 3.2]
[4.8 3.1]
[5.4 3.4]
[5.2 4.1]
[5.5 4.2]
[4.9 3.1]
[5. ?3.2]
[5.5 3.5]
[4.9 3.6]
[4.4 3. ]
[5.1 3.4]
[5. ?3.5]
[4.5 2.3]
[4.4 3.2]
[5. ?3.5]
[5.1 3.8]
[4.8 3. ]
[5.1 3.8]
[4.6 3.2]
[5.3 3.7]
[5. ?3.3]
[7. ?3.2]
[6.4 3.2]
[6.9 3.1]
[5.5 2.3]
[6.5 2.8]
[5.7 2.8]
[6.3 3.3]
[4.9 2.4]
[6.6 2.9]
[5.2 2.7]
[5. ?2. ]
[5.9 3. ]
[6. ?2.2]
[6.1 2.9]
[5.6 2.9]
[6.7 3.1]
[5.6 3. ]
[5.8 2.7]
[6.2 2.2]
[5.6 2.5]
[5.9 3.2]
[6.1 2.8]
[6.3 2.5]
[6.1 2.8]
[6.4 2.9]
[6.6 3. ]
[6.8 2.8]
[6.7 3. ]
[6. ?2.9]
[5.7 2.6]
[5.5 2.4]
[5.5 2.4]
[5.8 2.7]
[6. ?2.7]
[5.4 3. ]
[6. ?3.4]
[6.7 3.1]
[6.3 2.3]
[5.6 3. ]
[5.5 2.5]
[5.5 2.6]
[6.1 3. ]
[5.8 2.6]
[5. ?2.3]
[5.6 2.7]
[5.7 3. ]
[5.7 2.9]
[6.2 2.9]
[5.1 2.5]
[5.7 2.8]
[6.3 3.3]
[5.8 2.7]
[7.1 3. ]
[6.3 2.9]
[6.5 3. ]
[7.6 3. ]
[4.9 2.5]
[7.3 2.9]
[6.7 2.5]
[7.2 3.6]
[6.5 3.2]
[6.4 2.7]
[6.8 3. ]
[5.7 2.5]
[5.8 2.8]
[6.4 3.2]
[6.5 3. ]
[7.7 3.8]
[7.7 2.6]
[6. ?2.2]
[6.9 3.2]
[5.6 2.8]
[7.7 2.8]
[6.3 2.7]
[6.7 3.3]
[7.2 3.2]
[6.2 2.8]
[6.1 3. ]
[6.4 2.8]
[7.2 3. ]
[7.4 2.8]
[7.9 3.8]
[6.4 2.8]
[6.3 2.8]
[6.1 2.6]
[7.7 3. ]
[6.3 3.4]
[6.4 3.1]
[6. ?3. ]
[6.9 3.1]
[6.7 3.1]
[6.9 3.1]
[5.8 2.7]
[6.8 3.2]
[6.7 3.3]
[6.7 3. ]
[6.3 2.5]
[6.5 3. ]
[6.2 3.4]
[5.9 3. ]]分2類:
初始中心點 [[6.4 3.1]
[7.2 3.6]]
初始樣本分類 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2類最佳J值為: 58.20409278906674分3類:
初始中心點 [[5.4 3.4]
[5.4 3.4]
[7.7 2.8]]
初始樣本分類 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 0, 2, 2, 2, 0, 0, 0, 2, 0, 0, 0, 2, 2, 2, 0, 2, 2, 2, 0, 0, 0, 0]
3類最佳J值為: 58.20409278906674分4類:
初始中心點 [[6.7 3.1]
[6.4 2.7]
[6.5 3.2]
[5.5 2.4]]
初始樣本分類 [3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 0, 2, 0, 3, 1, 3, 2, 3, 0, 3, 3, 1, 3, 1, 3, 0, 3, 3, 1, 3, 2, 1, 1, 1, 1, 0, 0, 0, 1, 3, 3, 3, 3, 1, 3, 2, 0, 1, 3, 3, 3, 1, 3, 3, 3, 3, 3, 1, 3, 3, 2, 3, 0, 1, 2, 0, 3, 0, 1, 0, 2, 1, 0, 3, 3, 2, 2, 0, 0, 3, 0, 3, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 2, 2, 1, 0, 0, 0, 3, 0, 0, 0, 1, 2, 2, 1]
4類最佳J值為: 28.23339146670904分5類:
初始中心點 [[6.3 2.5]
[5.1 3.5]
[6.4 3.2]
[7.1 3. ]
[5.5 3.5]]
初始樣本分類 [1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1, 1, 1, 4, 4, 4, 1, 4, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 4, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 3, 0, 0, 0, 2, 1, 2, 1, 0, 2, 0, 2, 4, 2, 4, 0, 0, 0, 2, 0, 0, 0, 2, 2, 3, 2, 0, 0, 0, 0, 0, 0, 4, 2, 2, 0, 4, 0, 0, 2, 0, 1, 0, 4, 4, 2, 1, 0, 2, 0, 3, 2, 2, 3, 1, 3, 0, 3, 2, 0, 3, 0, 0, 2, 2, 3, 3, 0, 3, 4, 3, 0, 2, 3, 0, 2, 0, 3, 3, 3, 0, 0, 0, 3, 2, 2, 2, 3, 2, 3, 0, 3, 2, 2, 0, 2, 2, 2]
5類最佳J值為: 21.200013093214928分6類:
初始中心點 [[6.8 2.8]
[5.8 2.6]
[4.4 3. ]
[6.2 3.4]
[6.4 3.2]
[6. ?3. ]]
初始樣本分類 [2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 3, 3, 3, 2, 3, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 3, 3, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 0, 4, 0, 1, 0, 1, 3, 2, 0, 1, 1, 5, 1, 5, 1, 4, 5, 1, 1, 1, 5, 5, 1, 5, 4, 4, 0, 0, 5, 1, 1, 1, 1, 1, 1, 3, 4, 1, 5, 1, 1, 5, 1, 1, 1, 5, 1, 5, 1, 1, 3, 1, 0, 5, 4, 0, 2, 0, 0, 4, 4, 0, 0, 1, 1, 4, 4, 0, 0, 1, 0, 1, 0, 5, 4, 0, 5, 5, 0, 0, 0, 0, 0, 5, 1, 0, 3, 4, 5, 0, 4, 0, 1, 4, 4, 0, 1, 4, 3, 5]
6類最佳J值為: 18.150987445152886進程已結束,退出代碼0