使用k-近鄰算法改進約會網站的配對效果（kNN）

谷歌筆記本（可選）

準備數據：從文本文件中解析數據

編寫算法：編寫kNN算法

分析數據：使用Matplotlib創建散點圖

準備數據：歸一化數值

測試算法：作為完整程序驗證分類器

使用算法：構建完整可用系統

谷歌筆記本（可選）

from?google.colab?import?drive
drive.mount("/content/drive")

Mounted at /content/drive

準備數據：從文本文件中解析數據

def?file2matrix(filename):fr?=?open(filename)arrayOfLines?=?fr.readlines()numberOfLines?=?len(arrayOfLines)returnMat?=?zeros((numberOfLines,?3))classLabelVector?=?[]index?=?0for?line?in?arrayOfLines:line?=?line.strip()listFromLine?=?line.split('\t')returnMat[index,?:]?=?listFromLine[0:3]classLabelVector.append(int(listFromLine[-1]))index?+=?1return?returnMat,?classLabelVector

datingDataMat,?datingLabels?=?file2matrix('/content/drive/MyDrive/MachineLearning/機器學習/k-近鄰算法/使用k-近鄰算法改進約會網站的配對效果/datingTestSet2.txt')
datingDataMat

array([[4.0920000e+04, 8.3269760e+00, 9.5395200e-01], [1.4488000e+04, 7.1534690e+00, 1.6739040e+00], [2.6052000e+04, 1.4418710e+00, 8.0512400e-01], ..., [2.6575000e+04, 1.0650102e+01, 8.6662700e-01], [4.8111000e+04, 9.1345280e+00, 7.2804500e-01], [4.3757000e+04, 7.8826010e+00, 1.3324460e+00]])

datingLabels[:10]

[3, 2, 1, 1, 1, 1, 3, 3, 1, 3]

編寫算法：編寫kNN算法

from?numpy?import?*
import?operatordef?classify0(inX,?dataSet,?labels,?k):dataSetSize?=?dataSet.shape[0]diffMat?=?tile(inX,?(dataSetSize,?1))?-?dataSetsqDiffMat?=?diffMat?**?2sqDistances?=?sqDiffMat.sum(axis=1)distances?=?sqDistances**0.5sortedDistIndicies?=?distances.argsort()classCount?=?{}for?i?in?range(k):voteIlabel?=?labels[sortedDistIndicies[i]]classCount[voteIlabel]?=?classCount.get(voteIlabel,?0)?+?1sortedClassCount?=?sorted(classCount.items(),?key=operator.itemgetter(1),?reverse=True)return?sortedClassCount[0][0]

分析數據：使用Matplotlib創建散點圖

import?matplotlib
import?matplotlib.pyplot?as?plt
fig?=?plt.figure()
ax?=?fig.add_subplot(111)
ax.scatter(datingDataMat[:,?1],?datingDataMat[:,?2])
plt.show()

import?matplotlib
import?matplotlib.pyplot?as?plt
fig?=?plt.figure()
ax?=?fig.add_subplot(111)
ax.scatter(datingDataMat[:,?1],?datingDataMat[:,?2],15.0*array(datingLabels),?15.0*array(datingLabels))
plt.show()

import?matplotlib
import?matplotlib.pyplot?as?plt
fig?=?plt.figure()
ax?=?fig.add_subplot(111)
ax.scatter(datingDataMat[:,?0],?datingDataMat[:,?1],15.0*array(datingLabels),?15.0*array(datingLabels))
plt.show()

準備數據：歸一化數值

def?autoNorm(dataSet):minVals?=?dataSet.min(0)maxVals?=?dataSet.max(0)ranges?=?maxVals?-?minValsnormDataSet?=?zeros(shape(dataSet))m?=?dataSet.shape[0]normDataSet?=?dataSet?-?tile(minVals,?(m,1))normDataSet?=?normDataSet/tile(ranges,?(m,1))return?normDataSet,?ranges,?minVals

normMat,?ranges,?minVals?=?autoNorm(datingDataMat)
normMat

array([[0.44832535, 0.39805139, 0.56233353],[0.15873259, 0.34195467, 0.98724416],[0.28542943, 0.06892523, 0.47449629],...,[0.29115949, 0.50910294, 0.51079493],[0.52711097, 0.43665451, 0.4290048 ],[0.47940793, 0.3768091 , 0.78571804]])

ranges

array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])

minVals

array([0.      , 0.      , 0.001156])

測試算法：作為完整程序驗證分類器

def?datingClassTest():hoRatio?=?0.1datingDataMat,?datingLabels?=?file2matrix('/content/drive/MyDrive/MachineLearning/機器學習/k-近鄰算法/使用k-近鄰算法改進約會網站的配對效果/datingTestSet2.txt')normMat,?ranges,?minVals?=?autoNorm(datingDataMat)m?=?normMat.shape[0]numTestVecs?=?int(m*hoRatio)errorCount?=?0for?i?in?range(numTestVecs):classifierResult?=?classify0(normMat[i,:],?normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],3)print("the?classifierResult?came?back?with:?%d,\the?real?answer?is:?%d"?%?(classifierResult,?datingLabels[i]))if?(classifierResult?!=?datingLabels[i]):errorCount?+=?1print("the?total?error?rate?is:?%f"?%?(errorCount/float(numTestVecs)))

datingClassTest()

the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 3
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 3,    the real answer is: 3
the classifierResult came back with: 2,    the real answer is: 2
the classifierResult came back with: 1,    the real answer is: 1
the classifierResult came back with: 3,    the real answer is: 1
the total error rate is: 0.050000

使用算法：構建完整可用系統

def?classifyPerson():resultList?=?['not?at?all','in?small?doses','in?large?doses',]percentTats?=?float(input("percentage?of?time?spent?playing?video?games?"))ffMiles?=?float(input("frequent?flier?miles?earned?per?year?"))iceCream?=?float(input("liters?of?ice?cream?consumed?per?year?"))datingDataMat,?datingLabels?=?file2matrix('/content/drive/MyDrive/MachineLearning/機器學習/k-近鄰算法/使用k-近鄰算法改進約會網站的配對效果/datingTestSet2.txt')normMat,?ranges,?minVals?=?autoNorm(datingDataMat)inArr?=?array([ffMiles,?percentTats,?iceCream])classifierResult?=?classify0((inArr?-?minVals)/ranges,?normMat,?datingLabels,?3)print("You?will?probably?like?this?person:",?resultList[classifierResult?-?1])

classifyPerson()

percentage of time spent playing video games?10
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.5
You will probably like this person: in small doses