pdist, squareform
- 1.pdist, squareform使用例子
- 2.通過矩陣的四則運算實現上述pdist, squareform
scipy.spatial.distance 距離計算庫中有兩個函數:pdist, squareform,用于計算樣本對之間的歐式距離,并且將樣本間距離用方陣表示出來。
(題外話)
SciPy: 基于Numpy,提供方法(函數庫)直接計算結果,封裝了一些高階抽象和物理模型
Numpy: 來存儲和處理大型矩陣,比Python自身的嵌套列表(nested list structure)結構要高效的多,本身是由C語言開發。
Pandas: 基于NumPy 的一種工具,該工具是為了解決數據分析任務而創建的。
參考 資料:https://www.jianshu.com/p/32cb09d84487
(回正題)
1.pdist, squareform使用例子
pdist, squareform的操作基于numpy,
>>> import numpy as np
>>> from scipy.spatial.distance import pdist, squareform
>>> x=np.array([[1,1,1],[2,2,2],[4,4,4]]) #三個一維向量:x1=[1,1,1] x2=[2,2,2],x3=[4,4,4]>>> Dis=pdist(x)
>>> Dis # d(x1,x2)=sqrt(3)=1.7 ,d(x1,x3)=sqrt(27),d(x2,x3)=sqrt(8)
array([1.73205081, 5.19615242, 3.46410162])>>> D=squareform(Dis)
array([[0. , 1.73205081, 5.19615242], # d(x1,x1),d(x1,x2),d(x1,x3)[1.73205081, 0. , 3.46410162], # d(x2,x1),d(x2,x2),d(x2,x3)[5.19615242, 3.46410162, 0. ]]) # d(x3,x1),d(x3,x2),d(x3,x1)
因為距離度量具有對稱性,即d(x1,x2)=d(x2,x1)d(x1,x2)=d(x2,x1)d(x1,x2)=d(x2,x1),所以上述矩陣為一個對稱陣。
2.通過矩陣的四則運算實現上述pdist, squareform
有三個三維樣本:x1=[1,1,1],x2=[2,2,2]x3=[4,4,4],樣本之間距離的方陣為:
D=[d(x1,x1)d(x1,x2)d(x1,x3)d(x2,x1)d(x2,x2)d(x2,x3)d(x3,x1)d(x3,x2)d(x3,x3)]D=\begin{bmatrix} d(x1,x1)& d(x1,x2) & d(x1,x3)\\ d(x2,x1)& d(x2,x2) & d(x2,x3)\\ d(x3,x1)& d(x3,x2) & d(x3,x3)\end{bmatrix} D=???d(x1,x1)d(x2,x1)d(x3,x1)?d(x1,x2)d(x2,x2)d(x3,x2)?d(x1,x3)d(x2,x3)d(x3,x3)????
d(x,y)=xxT+yyT?2xyTd(x,y)=xx^T+yy^T-2xy^Td(x,y)=xxT+yyT?2xyT
所以:
D=[x1x1T+x1x1T?2x1x1T,x1x1T+x2x2T?2x1x2T,x1x1T+x3x3T?2x1x3Tx2x2T+x1x1T?2x2x1T,x2x2T+x2x2T?2x2x1T,x2x2T+x3x3T?2x2x3Tx3x3T+x1x1T?2x3x1T,x3x3T+x2x2T?2x3x2T,x3x3T+x3x3T?2x3x3T]D=\begin{bmatrix} x_1x_1^T+x_1x_1^T-2x_1x_1^T,& x_1x_1^T+x_2x_2^T-2x_1x_2^T ,& x_1x_1^T+x_3x_3^T-2x_1x_3^T\\ x_2x_2^T+x_1x_1^T-2x_2x_1^T,& x_2x_2^T+x_2x_2^T-2x_2x_1^T ,& x_2x_2^T+x_3x_3^T-2x_2x_3^T\\ x_3x_3^T+x_1x_1^T-2x_3x_1^T,& x_3x_3^T+x_2x_2^T-2x_3x_2^T ,& x_3x_3^T+x_3x_3^T-2x_3x_3^T\end{bmatrix} D=???x1?x1T?+x1?x1T??2x1?x1T?,x2?x2T?+x1?x1T??2x2?x1T?,x3?x3T?+x1?x1T??2x3?x1T?,?x1?x1T?+x2?x2T??2x1?x2T?,x2?x2T?+x2?x2T??2x2?x1T?,x3?x3T?+x2?x2T??2x3?x2T?,?x1?x1T?+x3?x3T??2x1?x3T?x2?x2T?+x3?x3T??2x2?x3T?x3?x3T?+x3?x3T??2x3?x3T?????
=[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]+[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]T?2[x1x1T,x1x2T,x1x3Tx2x1T,x2x1T,x2x3Tx3x1T,x3x2T,x3x3T]=\begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix}+ \begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix}^T-2 \begin{bmatrix} x_1x_1^T,& x_1x_2^T ,&x_1x_3^T\\ x_2x_1^T,& x_2x_1^T ,&x_2x_3^T\\ x_3x_1^T,& x_3x_2^T ,& x_3x_3^T\end{bmatrix} =???x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?x2?x2T?x3?x3T?????+???x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?x2?x2T?x3?x3T?????T?2???x1?x1T?,x2?x1T?,x3?x1T?,?x1?x2T?,x2?x1T?,x3?x2T?,?x1?x3T?x2?x3T?x3?x3T?????
=>[x1x1T,x1x1T,x1x1Tx2x2T,x2x2T,x2x2Tx3x3T,x3x3T,x3x3T]=> \begin{bmatrix} x_1x_1^T,& x_1x_1^T ,& x_1x_1^T\\ x_2x_2^T,& x_2x_2^T ,& x_2x_2^T\\ x_3x_3^T,& x_3x_3^T ,& x_3x_3^T \end{bmatrix} =>???x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?,x2?x2T?,x3?x3T?,?x1?x1T?x2?x2T?x3?x3T?????
矩陣對應元素相乘,行復制
[x1x1T,x1x2T,x1x3Tx2x1T,x2x1T,x2x3Tx3x1T,x3x2T,x3x3T]=[x1x2x3]?[x1x2x3]T\begin{bmatrix} x_1x_1^T,& x_1x_2^T ,&x_1x_3^T\\ x_2x_1^T,& x_2x_1^T ,&x_2x_3^T\\ x_3x_1^T,& x_3x_2^T ,& x_3x_3^T\end{bmatrix}= \begin{bmatrix} x1\\ x2\\ x3\end{bmatrix}* \begin{bmatrix} x1\\ x2\\ x3\end{bmatrix}^T ???x1?x1T?,x2?x1T?,x3?x1T?,?x1?x2T?,x2?x1T?,x3?x2T?,?x1?x3T?x2?x3T?x3?x3T?????=???x1x2x3????????x1x2x3????T
程序實現:
X=np.array([[1,1,1],[2,2,2],[3,3,3]])
X2=(X*X).sum(1)*np.ones([3,3])
XXT=np.matmul(X,X.T)
D=X2+X2.T-2*XXT
D=np.sqrt(D2)
print (D)# 輸出
[[ 0. 1.73205081 5.19615242][ 1.73205081 0. 3.46410162][ 5.19615242 3.46410162 0. ]]
**溫馨提示:**上述矩陣為距離矩陣,在實際應用的過程中,注意使用的是距離的平方,還是距離。