從本篇開始，我們來記錄一下全卷積網絡用來做語義分割的全過程。
代碼：https://github.com/shelhamer/fcn.berkeleyvision.org

下面我們將描述三方面的內容：
1. 官方提供的公開數據集
2. 自己的數據集如何準備，主要是如何標注label
3. 訓練結束后如何對結果著色。

公開數據集

這里分別說一下SiftFlowDataset與pascal voc數據集。
1. pascal voc
根據FCN代碼中的data文件夾下的pascal說明：

# PASCAL VOC and SBDPASCAL VOC is a standard recognition dataset and benchmark with detection and semantic segmentation challenges.
The semantic segmentation challenge annotates 20 object classes and background.
The Semantic Boundary Dataset (SBD) is a further annotation of the PASCAL VOC data that provides more semantic segmentation and instance segmentation masks.PASCAL VOC has a private test set and [leaderboard for semantic segmentation](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6).The train/val/test splits of PASCAL VOC segmentation challenge and SBD diverge.
Most notably VOC 2011 segval intersects with SBD train.
Care must be taken for proper evaluation by excluding images from the train or val splits.We train on the 8,498 images of SBD train.
We validate on the non-intersecting set defined in the included `seg11valid.txt`.Refer to `classes.txt` for the listing of classes in model output order.
Refer to `../voc_layers.py` for the Python data layer for this dataset.See the dataset sites for download:- PASCAL VOC 2012: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
- SBD: see [homepage](http://home.bharathh.info/home/sbd) or [direct download](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz)

我們可以下載訓練數據集：SBD 以及測試集：PASCAL VOC 2012
然后進入fcn/data，新建sbdd文件夾（如果沒有），將benchmark的dataset解壓到sbdd中，將VOC2012解壓到data下的pascal文件夾下。這兩個文件夾已經準備好了train.txt用于訓練，seg11valid.txt用于測試。
2. SIFT-Flow
下載數據集：下載地址。
并解壓至/fcn.berkeleyvision.org/data/下，并覆蓋名為sift-flow的文件夾。
由于FCN源代碼已經為我們準備好了train.txt等文件了，所以不需要重新生成。

準備自己的數據集

深度學習圖像分割（FCN）訓練自己的模型大致可以以下三步：

1.為自己的數據制作label；

2.將自己的數據分為train,val和test集；

3.仿照voc_lyaers.py編寫自己的輸入數據層。

在FCN中，圖像的大小是不限的，此時如果數據集的圖片大小不一，則每次只能訓一張圖片。這是FCN代碼的默認設置。即batch_size=1.但是如果批量訓練，則應該要求所有的數據集大小相同。此時我們需要使用resize進行縮放。一般情況下，我們將原圖縮放到256*256，或者500*500.

1. 縮放圖像

下面給出幾個縮放函數，來自網上：http://blog.csdn.net/u010402786/article/details/72883421
（1）單張圖片的resize

import Image  def  convert(width,height):im = Image.open("C:\\xxx\\test.jpg")out = im.resize((width, height),Image.ANTIALIAS)out.save("C:\\xxx\\test.jpg")
if __name__ == '__main__':convert(256,256)

（2）resize整個文件夾里的圖片

import Image
import osdef convert(dir,width,height):file_list = os.listdir(dir)print(file_list)for filename in file_list:path = ''path = dir+filenameim = Image.open(path)out = im.resize((256,256),Image.ANTIALIAS)print "%s has been resized!"%filenameout.save(path)if __name__ == '__main__':dir = raw_input('please input the operate dir:')convert(dir,256,256)

(3)按比例resize

import Image  def  convert(width,height):im = Image.open("C:\\workspace\\PythonLearn1\\test_1.jpg")(x, y)= im.sizex_s = widthy_s = y * x_s / xout = im.resize((x_s, y_s), Image.ANTIALIAS)out.save("C:\\workspace\\PythonLearn1\\test_1_out.jpg")
if __name__ == '__main__':convert(256,256)

圖像標簽制作

第一步：使用github開源軟件進行標注

地址：https://github.com/wkentaro/labelme

Usage

Annotation

Run labelme --help for detail.

labelme  # Open GUI
labelme static/apc2016_obj3.jpg  # Specify file
labelme static/apc2016_obj3.jpg -O static/apc2016_obj3.json  # Close window after the save

The annotations are saved as a JSON file. The
file includes the image itself.

Visualization

To view the json file quickly, you can use utility script:

labelme_draw_json static/apc2016_obj3.json

Convert to Dataset

To convert the json to set of image and label, you can run following:

labelme_json_to_dataset static/apc2016_obj3.json

第二步：為標注出來的label.png進行著色
上面的標注軟件將生成的json文件轉化為Dataset后，會生成label.png文件。是一張灰度圖像，16位。
因此我們需要對照VOC分割的顏色進行著色，一定要保證顏色的準確性。Matlab代碼:

function cmap = labelcolormap(N)if nargin==0N=256
end
cmap = zeros(N,3);
for i=1:Nid = i-1; r=0;g=0;b=0;for j=0:7r = bitor(r, bitshift(bitget(id,1),7 - j));g = bitor(g, bitshift(bitget(id,2),7 - j));b = bitor(b, bitshift(bitget(id,3),7 - j));id = bitshift(id,-3);endcmap(i,1)=r; cmap(i,2)=g; cmap(i,3)=b;
end
cmap = cmap / 255;

或者python代碼：

import numpy as np# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
#  0 --- [  0   0   0],  1 --- [128   0   0],  2 --- [  0 128   0]
#  3 --- [128 128   0],  4 --- [  0   0 128],  5 --- [128   0 128]
#  6 --- [  0 128 128],  7 --- [128 128 128],  8 --- [ 64   0   0]
#  9 --- [192   0   0], 10 --- [ 64 128   0], 11 --- [192 128   0]
# 12 --- [ 64   0 128], 13 --- [192   0 128], 14 --- [ 64 128 128]
# 15 --- [192 128 128], 16 --- [  0  64   0], 17 --- [128  64   0]
# 18 --- [  0 192   0], 19 --- [128 192   0], 20 --- [  0  64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_map/255if __name__=="__main__":color_map=labelcolormap(21)print color_map

上面會生成如下的矩陣,以python的結果為例：

[[ 0.          0.          0.        ][ 0.50196078  0.          0.        ][ 0.          0.50196078  0.        ][ 0.50196078  0.50196078  0.        ][ 0.          0.          0.50196078][ 0.50196078  0.          0.50196078][ 0.          0.50196078  0.50196078][ 0.50196078  0.50196078  0.50196078][ 0.25098039  0.          0.        ][ 0.75294118  0.          0.        ][ 0.25098039  0.50196078  0.        ][ 0.75294118  0.50196078  0.        ][ 0.25098039  0.          0.50196078][ 0.75294118  0.          0.50196078][ 0.25098039  0.50196078  0.50196078][ 0.75294118  0.50196078  0.50196078][ 0.          0.25098039  0.        ][ 0.50196078  0.25098039  0.        ][ 0.          0.75294118  0.        ][ 0.50196078  0.75294118  0.        ][ 0.          0.25098039  0.50196078]]

分別對應著Pascal voc的colormap:

background           0   0 0  
aeroplane            128 0 0  
bicycle              0 128 0 
bird                128 128 0 
boat                0 0 128 
bottle              128 0 128  
bus                  0 128 128  
car                 128 128 128
cat                 64 0 0  
chair               192 0 0 
cow                 64 128 0 
diningtable         192 128 0  
dog                 64 0 128 
horse               192 0 128 
motorbike           64 128 128 
person              192 128 128 
pottedplant          0 64 0  
sheep               128 64 0 
sofa                 0 192 0 
train               128 192 0 
tvmonitor           0 64 128

這里使用函數生成了label對應的顏色，這里label就是指0,1,2，… ,21(這里pascal voc共21類)
而在第一步標注生成的圖像label.png里面的數值就是0,1,2…21.最多256個數值。一般取為灰度圖像。
因此我們需要根據這個colormap將上面生成的灰度圖轉化為rgb圖像。

方法一：改造skimage的colormap
其實在skimage中已經包含了部分colormap，但是不是針對于pascal voc的格式，因此我們需要單獨指定。
找到如下路徑：

/*/anaconda2/lib/python2.7/site-packages/skimage/color/

修改colorlabel.py，增加

DEFAULT_COLORS1 = ('maroon', 'lime', 'olive', 'navy', 'purple', 'teal','gray', 'fcncat', 'fcnchair', 'fcncow', 'fcndining','fcndog', 'fcnhorse', 'fcnmotor', 'fcnperson', 'fcnpotte','fcnsheep', 'fcnsofa', 'fcntrain', 'fcntv')

并且把_label2rgb_overlay函數改造：

  if colors is None:colors = DEFAULT_COLORS1

最后在rgb_colors.py中新增如下變量：

fcnchair = (0.753, 0, 0)
fcncat = (0.251, 0, 0)
fcncow = (0.251, 0.502, 0)
fcndining = (0.753, 0.502, 0)
fcndog = (0.251, 0, 0.502)
fcnhorse = (0.753, 0, 0.502)
fcnmotor = (0.251, 0.502, 0.502)
fcnperson = (0.753, 0.502, 0.502)
fcnpotte = (0, 0.251, 0)
fcnsheep = (0.502, 0.251, 0)
fcnsofa = (0, 0.753, 0)
fcntrain = (0.502, 0.753, 0)
fcntv = (0, 0.251, 0.502)

如果嫌麻煩，只需要下載：https://github.com/315386775/FCN_train
然后將Add_colortoimg下的skimge-color替換skimage的color文件夾即可。
最后執行轉換：

#!usr/bin/python
# -*- coding:utf-8 -*-
import PIL.Image
import numpy as np
from skimage import io,data,color
import matplotlib.pyplot as pltimg = PIL.Image.open('xxx.png')
img = np.array(img)
dst = color.label2rgb(img, bg_label=0, bg_color=(0, 0, 0))
io.imsave('xxx.png', dst)

方法二： 不修改源代碼

#!usr/bin/python
# -*- coding:utf-8 -*-
import PIL.Image
import numpy as np
from skimage import io,data,color# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
#  0 --- [  0   0   0],  1 --- [128   0   0],  2 --- [  0 128   0]
#  4 --- [128 128   0],  5 --- [  0   0 128],  6 --- [128   0 128]
#  7 --- [  0 128 128],  8 --- [128 128 128],  9 --- [ 64   0   0]
# 10 --- [192   0   0], 11 --- [ 64 128   0], 12 --- [192 128   0]
# 13 --- [ 64   0 128], 14 --- [192   0 128], 15 --- [ 64 128 128]
# 16 --- [192 128 128], 17 --- [  0  64   0], 18 --- [128  64   0]
# 19 --- [  0 192   0], 20 --- [128 192   0], 21 --- [  0  64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_map/255color_map = labelcolormap(21)img = PIL.Image.open('label.png')
img = np.array(img)
dst = color.label2rgb(img,colors=color_map[1:],bg_label=0, bg_color=(0, 0, 0))
io.imsave('xxx.png', dst)

這種方法直接加載了colormap，更簡單明了。

需要注意的是：第一種方法中，將部分colormap做了修改，比如DEFAULT_COLORS1的第二個color，本來應該是(0 128 0)，即(0, 0.502, 0)，在skimge顯示為green，但是這里使用了lime = (0, 1, 0)。不過差別不大。

第三步：最關鍵的一步
把24位png圖轉換為8位png圖，直接上matlab代碼：

dirs=dir('F:/xxx/*.png');
map =labelcolormap(256);
for n=1:numel(dirs)strname=strcat('F:/xxx/',dirs(n).name);img=imread(strname);x=rgb2ind(img,map);newname=strcat('F:/xxx/',dirs(n).name);imwrite(x,map,newname,'png');
end

至此我們就生成了8位的彩色圖。

需要注意的是，我們可以讀取上面的生成的圖像，看下面的輸出是否與VOC輸出一致。

In [23]: img = PIL.Image.open('F:/DL/000001_json/test/dstfcn.png')
In [24]: np.unique(img)
Out[24]: array([0, 1, 2], dtype=uint8)

主要關注[0, 1, 2] ，是不是有這樣的輸出，如果有，證明我們就成功地生成了label。

上面我們經歷了生成label灰度圖像–>生成colormap–>轉化為rgb—》轉化為8位rgb。

接下來，我們需要為訓練準備如下數據：
test.txt是測試集，train.txt是訓練集，val.txt是驗證集，trainval.txt是訓練和驗證集
這時可以參考faster rcnn的比例，VOC2007中，trainval大概是整個數據集的50%，test也大概是整個數據集的50%；train大概是trainval的50%，val大概是trainval的50%。可參考以下代碼：

參考：http://blog.csdn.net/sinat_30071459/article/details/50723212

%%
%該代碼根據已生成的xml，制作VOC2007數據集中的trainval.txt;train.txt;test.txt和val.txt
%trainval占總數據集的50%，test占總數據集的50%；train占trainval的50%，val占trainval的50%；
%上面所占百分比可根據自己的數據集修改，如果數據集比較少，test和val可少一些
%%
%注意修改下面四個值
xmlfilepath='E:\Annotations';
txtsavepath='E:\ImageSets\Main\';
trainval_percent=0.5;%trainval占整個數據集的百分比，剩下部分就是test所占百分比
train_percent=0.5;%train占trainval的百分比，剩下部分就是val所占百分比%%
xmlfile=dir(xmlfilepath);
numOfxml=length(xmlfile)-2;%減去.和..  總的數據集大小trainval=sort(randperm(numOfxml,floor(numOfxml*trainval_percent)));
test=sort(setdiff(1:numOfxml,trainval));trainvalsize=length(trainval);%trainval的大小
train=sort(trainval(randperm(trainvalsize,floor(trainvalsize*train_percent))));
val=sort(setdiff(trainval,train));ftrainval=fopen([txtsavepath 'trainval.txt'],'w');
ftest=fopen([txtsavepath 'test.txt'],'w');
ftrain=fopen([txtsavepath 'train.txt'],'w');
fval=fopen([txtsavepath 'val.txt'],'w');for i=1:numOfxmlif ismember(i,trainval)fprintf(ftrainval,'%s\n',xmlfile(i+2).name(1:end-4));if ismember(i,train)fprintf(ftrain,'%s\n',xmlfile(i+2).name(1:end-4));elsefprintf(fval,'%s\n',xmlfile(i+2).name(1:end-4));endelsefprintf(ftest,'%s\n',xmlfile(i+2).name(1:end-4));end
end
fclose(ftrainval);
fclose(ftrain);
fclose(fval);
fclose(ftest);

不過這里是利用了xml文件，我們可以直接利用img文件夾即可。

對測試結果著色

其實這一步主要就是修改infer.py
方法一：

import numpy as np
from PIL import Image
import caffe# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('pascal/VOC2010/JPEGImages/2007_000129.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((104.00698793,116.66876762,122.67891434))
in_ = in_.transpose((2,0,1))# load net
net = caffe.Net('voc-fcn8s/deploy.prototxt', 'voc-fcn8s/fcn8s-heavy-pascal.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
out = net.blobs['score'].data[0].argmax(axis=0)arr=out.astype(np.uint8)
im=Image.fromarray(arr)palette=[]
for i in range(256):palette.extend((i,i,i))
palette[:3*21]=np.array([[0, 0, 0],[128, 0, 0],[0, 128, 0],[128, 128, 0],[0, 0, 128],[128, 0, 128],[0, 128, 128],[128, 128, 128],[64, 0, 0],[192, 0, 0],[64, 128, 0],[192, 128, 0],[64, 0, 128],[192, 0, 128],[64, 128, 128],[192, 128, 128],[0, 64, 0],[128, 64, 0],[0, 192, 0],[128, 192, 0],[0, 64, 128]], dtype='uint8').flatten()
im.putpalette(palette)
im.show()
im.save('test.png')

或者采用跟準備數據一樣的方法：

import numpy as np
from PIL import Imageimport caffefrom scipy.misc import imread, imsave
from skimage.color import label2rgb# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
#  0 --- [  0   0   0],  1 --- [128   0   0],  2 --- [  0 128   0]
#  4 --- [128 128   0],  5 --- [  0   0 128],  6 --- [128   0 128]
#  7 --- [  0 128 128],  8 --- [128 128 128],  9 --- [ 64   0   0]
# 10 --- [192   0   0], 11 --- [ 64 128   0], 12 --- [192 128   0]
# 13 --- [ 64   0 128], 14 --- [192   0 128], 15 --- [ 64 128 128]
# 16 --- [192 128 128], 17 --- [  0  64   0], 18 --- [128  64   0]
# 19 --- [  0 192   0], 20 --- [128 192   0], 21 --- [  0  64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_mapdef main():# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffeim = Image.open('data/pascal/VOCdevkit/VOC2012/JPEGImages/2007_000346.jpg')in_ = np.array(im, dtype=np.float32)in_ = in_[:,:,::-1]in_ -= np.array((104.00698793,116.66876762,122.67891434))in_ = in_.transpose((2,0,1))# load netnet = caffe.Net('voc-fcn8s/deploy.prototxt', 'ilsvrc-nets/fcn8s-heavy-pascal.caffemodel', caffe.TEST)# shape for input (data blob is N x C x H x W), set datanet.blobs['data'].reshape(1, *in_.shape)net.blobs['data'].data[...] = in_# run net and take argmax for predictionnet.forward()out = net.blobs['score'].data[0].argmax(0).astype(np.uint8)color_map = labelcolormap(21)label_mask = label2rgb(out, colors=color_map[1:], bg_label=0)label_mask[out == 0] = [0, 0, 0]imsave('data/pascal/VOCdevkit/VOC2012/JPEGImages/test_prediction.png', label_mask.astype(np.uint8))if __name__ == '__main__':main()