前言:今天做邊緣計算的時候,在評估模型性能的時候發現NPU計算的大部分時間都花在了LSTM上,使用的是Bi-LSTM(耗時占比98%),CNN耗時很短,不禁會思考為什么LSTM會花費這么久時間。
?首先聲明一下實驗條件:這里使用的是振動信號,輸入的數據,長度是1024,通道是1通道輸入,batchsize也是1
一、CNN計算復雜度公式:
卷積核大小為?K x K
,輸入通道數為?C_in
,輸出通道數為?C_out
,輸入大小為?W x H
卷積操作的復雜度:?O(K*K?* C_in * C_out * W * H)
舉個例子:我的第一個卷積層input:1channel,output:32channels,卷積核大小是1*3,為了保持輸入數據長度和輸出數據長度保持不變,padding=(k-1)/2=1
輸入數據格式:1*1*1024(batchsize、channel、len)
輸入數據格式: 1*32*1024
計算復雜度:1*32*3*1024
二、LSTM計算復雜度公式:
假設 LSTM 的隱藏層大小為?H
,輸入大小為?I
,時間步數為?T
:
每個時間步的計算復雜度為?O(I * H + H^2)
(包括矩陣乘法和激活函數)。
LSTM計算復雜度為?O(T * (I * H + H*H))
舉個例子:輸入大小是指上一層CNN輸出的通道數128,隱藏層大小設置為128,時間步數就是數據長度:128
復雜度為:128*(128*128+128*128)=4194304
計算比例:4194304%(
32*3*1024)=43%
因為這個是雙層lstm:43*2=86符合預期,在實際計算中LSTM花費的時間更長,我估計是NPU對CNN結構的計算優化更好吧,下面是網絡的完整結構
Layer: CNN_LSTM_ModelInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 10])
Layer: Conv1dInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: ReLUInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: Conv1dInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: ReLUInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: MaxPool1dInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 512])
Layer: Conv1dInput shapes: [torch.Size([32, 32, 512])]Output shape: torch.Size([32, 64, 512])
Layer: ReLUInput shapes: [torch.Size([32, 64, 512])]Output shape: torch.Size([32, 64, 512])
Layer: MaxPool1dInput shapes: [torch.Size([32, 64, 512])]Output shape: torch.Size([32, 64, 256])
Layer: Conv1dInput shapes: [torch.Size([32, 64, 256])]Output shape: torch.Size([32, 128, 256])
Layer: ReLUInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 256])
Layer: MaxPool1dInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 128])
Layer: SequentialInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 128, 128])
Layer: LSTMInput shapes: [torch.Size([32, 128, 128]), <class 'tuple'>]Output shapes: [torch.Size([32, 128, 256]), <class 'tuple'>]
Layer: LinearInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 256])
Layer: AttentionInput shapes: [torch.Size([32, 128]), torch.Size([32, 128, 256])]Output shape: torch.Size([32, 1, 128])
Layer: LayerNormInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 256])
Layer: ResidualConnectionInput shapes: [torch.Size([32, 256]), <class 'function'>]Output shape: torch.Size([32, 256])
Layer: LinearInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 500])
Layer: ReLUInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 500])
Layer: DropoutInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 500])
Layer: LinearInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 10])
Layer: SequentialInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 10])