在ATI Stream Computing Programming Guide中,例舉了AMD 5系列顯卡的參數信息。
我比較關注其中Peak bandwidths的計算,以便在opencl程序測試bandwidth利用率。
下面,我以5870為例,探討一下如何計算得到這些結果:
??? L1 cache的 peak bandwidth(L1<=>ALU) = compute units* Wavefront Size/compute Unit *Engine clock = cu數量*每個cu的wave大小*顯卡系統時鐘頻率
= 20 * 64 * 0.85 = 1088 GB/s
注:在AMD GPU中,每個wave包含64個thread.
L2 cache peak bandwidth(L1<=>L2) = Number of Channels * wavefrontSize * Engine clock = 內存通道數量*wave大小*顯卡系統時鐘頻率
= 8 * 64 * 0.85 = 435.2 GB/s
注:在AMD 8XXX顯卡中,每個mc通道對應一個64K的L2 cache。
Global memeory peak rate(L2<=>Memory) = Number of Channels * memory pin rate * bits per chanel/8 = 內存通道數量*memory pin rate*每個channel位寬/8
= 8 * 4.800 * 32/8 = 153.6 GB/s
注:在cypress中,用的GDDR5,mclk是1200MHZ, GDDR5的date rate 是4,所以memory pin rate = 1200 * 4 = 4800Mb/pin
除以8是轉化為字節。
?
Const cache read peak rate = peak read bandwidth per stream core * pe number * engine clock = 每個pe 的讀帶寬*pe數量*系統時鐘頻率
= 16 * 320 * 0.85 = 4352 GB/s
注:5870中的hardware參數
另外需要注意的對于consant buffer,只有直接地址訪問時候,才能達到4352GB/s的峰值,如果通過索引方式,參考上表,用4或這0.6代替16.
LDS Read peak rate = peak read bandwidth per stream core * pe number * engine clock = 每個pe 的讀帶寬*pe數量*系統時鐘頻率
= 8 * 320 * 0.85 = 2176 Gb/s
注:LDS(對應cl中local memory)帶寬計算方式和const buffer一樣。
GPR read peak rate = peak read bandwidth per stream core * pe number * engine clock = 每個pe 的讀帶寬*pe數量*系統時鐘頻率
= 48 * 320 * 0.85 = 13056 GB/s
注:GPR(通用寄存器,對應cl中worktime 使用的private變量,對于kernel中局部變量,shade compiler一般都為其分配GPR)帶寬計算方式和const buffer一樣
下圖為58xx的性能參數: