記錄一下感受野的理解:
在神經網絡中,感受野的定義是:?
神經網絡的每一層輸出的特征圖(Feature ap)上的像素點在原圖像上映射的區域大小。?
1. 神經網絡中,第一個卷積層的 感受野大小,就等于filter,濾波器的大小。
2.?深層卷積層的感受野大小和它之前所有層的濾波器大小和步長有關系。
3.計算感受野大小時,忽略了圖像邊緣的影響,即不考慮padding的大小。
?
首先說strides = 之前的神經網絡層的步長乘積,也就是:strides(i) =?stride(1) *?stride(2) * ...*?stride(i-1)?
感受野的計算,是從(最深層-1)的神經網絡,迭代到第一層,來計算的, 公式簡單表達為:
RF{i} =? (RF{i-1} - 1) * stride + ConvSize
?
RCNN論文中有一段描述,Alexnet網絡pool5輸出的特征圖上的像素在輸入圖像上有很大的感受野(have very large receptive fields (195 × 195 pixels))和步長(strides (32×32 pixels)?), 這兩個變量的數值是如何得出的呢?
用python代碼表達:
#!/usr/bin/env python net_struct = {'alexnet': {'net':[[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0]],'name':['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5']},'vgg16': {'net':[[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0]],'name':['conv1_1','conv1_2','pool1','conv2_1','conv2_2','pool2','conv3_1','conv3_2','conv3_3', 'pool3','conv4_1','conv4_2','conv4_3','pool4','conv5_1','conv5_2','conv5_3','pool5']},'zf-5':{'net': [[7,2,3],[3,2,1],[5,2,2],[3,2,1],[3,1,1],[3,1,1],[3,1,1]],'name': ['conv1','pool1','conv2','pool2','conv3','conv4','conv5']}}imsize = 224def outFromIn(isz, net, layernum):totstride = 1insize = iszfor layer in range(layernum):fsize, stride, pad = net[layer]outsize = (insize - fsize + 2*pad) / stride + 1insize = outsizetotstride = totstride * stridereturn outsize, totstridedef inFromOut(net, layernum):RF = 1for layer in reversed(range(layernum)):fsize, stride, pad = net[layer]RF = ((RF -1)* stride) + fsizereturn RFif __name__ == '__main__':print "layer output sizes given image = %dx%d" % (imsize, imsize)for net in net_struct.keys():print '************net structrue name is %s**************'% netfor i in range(len(net_struct[net]['net'])):p = outFromIn(imsize,net_struct[net]['net'], i+1)rf = inFromOut(net_struct[net]['net'], i+1)print "Layer Name = %s, Output size = %3d, Stride = % 3d, RF size = %3d" % (net_struct[net]['name'][i], p[0], p[1], rf)
?