我最近一直致力于一個項目,其中我的大部分時間花費在密集矩陣A和稀疏向量v上(見
here).在我嘗試減少計算時,我注意到A.dot(v)的運行時間不受v的零條目數的影響.
為了解釋為什么我希望在這種情況下改進運行時,讓result = A.dot.v使得j = 1的結果[j] = sum_i(A [i,j] * v [j])… v.shape [0].如果v [j] = 0,則無論值A [::,j]如何,顯然結果[j] = 0.在這種情況下,我希望numpy只設置result [j] = 0,但似乎它繼續并計算sum_i(A [i,j] * v [j])無論如何.
我繼續編寫了一個簡短的示例腳本來確認下面的這種行為.
import time
import numpy as np
np.__config__.show() #make sure BLAS/LAPACK is being used
np.random.seed(seed = 0)
n_rows, n_cols = 1e5, 1e3
#initialize matrix and vector
A = np.random.rand(n_rows, n_cols)
u = np.random.rand(n_cols)
u = np.require(u, dtype=A.dtype, requirements = ['C'])
#time
start_time = time.time()
A.dot(u)
print "time with %d non-zero entries: %1.5f seconds" % (sum(u==0.0), (time.time() - start_time))
#set all but one entry of u to zero
v = u
set_to_zero = np.random.choice(np.array(range(0, u.shape[0])), size = (u.shape[0]-2), replace=False)
v[set_to_zero] = 0.0
start_time = time.time()
A.dot(v)
print "time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))
#what I would really expect it to take
non_zero_index = np.squeeze(v != 0.0)
A_effective = A[::,non_zero_index]
v_effective = v[non_zero_index]
start_time = time.time()
A_effective.dot(v_effective)
print "expected time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))
運行這個,我得到矩陣向量乘法的運行時是相同的,無論我使用密集矩陣u還是稀疏矩陣v:
time with 0 non-zero entries: 0.04279 seconds
time with 999 non-zero entries: 0.04050 seconds
expected time with 999 non-zero entries: 0.00466 seconds
我想知道這是否是設計的?或者我錯過了我正在運行矩陣向量乘法的方式.就像健全性檢查一樣:我確保numpy鏈接到我的機器上的BLAS庫,并且兩個數組都是C_CONTIGUOUS(因為這顯然需要numpy來調用BLAS).