準備初始數據

mean_shape

mean_shape就是訓練圖片所有ground_truth points的平均值.那么具體怎么做呢？是不是直接將特征點相加求平均值呢？
顯然這樣做是倉促和不準確的。因為圖片之間人臉是各式各樣的，收到光照、姿勢等各方面的影響。因此我們求取平均值，應該在一個相對統一的框架下求取。如下先給出matlab代碼:

function mean_shape = calc_meanshape(shapepathlistfile)fid = fopen(shapepathlistfile);
shapepathlist = textscan(fid, '%s', 'delimiter', '\n');if isempty(shapepathlist)error('no shape file found');mean_shape = [];return;
endshape_header = loadshape(shapepathlist{1}{1});if isempty(shape_header)error('invalid shape file');mean_shape = [];return;
endmean_shape = zeros(size(shape_header));num_shapes = 0;
for i = 1:length(shapepathlist{1})shape_i = double(loadshape(shapepathlist{1}{i}));if isempty(shape_i)continue;endshape_min = min(shape_i, [], 1);shape_max = max(shape_i, [], 1);% translate to origin pointshape_i = bsxfun(@minus, shape_i, shape_min);% resize shapeshape_i = bsxfun(@rdivide, shape_i, shape_max - shape_min);mean_shape = mean_shape + shape_i;num_shapes = num_shapes + 1;
endmean_shape = mean_shape ./ num_shapes;img = 255 * ones(500, 500, 3);drawshapes(img, 50 + 400 * mean_shape);endfunction shape = loadshape(path)
% function: load shape from pts file
file = fopen(path);
if file == -1shape = [];fclose(file);return;
end
shape = textscan(file, '%d16 %d16', 'HeaderLines', 3, 'CollectOutput', 2);
fclose(file);
shape = shape{1};
end

解析:

公式表示:

{s h a p e g t ? [R e g i o n (1), R e g i o n (2)]} / [R e g i o n (3), R e g i o n (4)))]] ? [0, 1] \times [0, 1]

$\{shape_{gt}-[Region(1),Region(2)]\}/[Region(3),Region(4)))]] \Rightarrow [0,1]\times[0,1]$

準備 $\Delta S^t$

我們知道3000FPS的核心思想是:

Δ S t = W t Φ t (I, S t ? 1)

$\Delta S^t=W^t\Phi^t(I,S^{t-1})$
其中

ΔSt=Sgt?St $\Delta S^t=S^{gt}-S^{t}$ 為第t個階段的殘差；而

Φt(I,St?1) $\Phi^t(I,S^{t-1})$ 則為特征提取函數；W為線性回歸矩陣。由《人臉配準坐標變換解析》我們可以看到所謂的

ΔSt $\Delta S^t$ 需進行相似性變換，而

Φt(I,St?1) $\Phi^t(I,S^{t-1})$ 則不需要.
相似性變換的主要過程是:
先將

St $S^t$ ，

S0 $S^0$ 中心化變換，再求解如下變換矩陣:

S 0 = c R S t

$S^0=cRS^t$ ,求解完cR后，對

ΔSt $\Delta S^t$ 施加同樣的變換，即

S t ? = c R Δ S t

$\widetilde{S^t}=cR\Delta S^t$ .我們將使用變化后的

St? $\widetilde{S^t}$ 去求解線性回歸矩陣W.
先貼代碼: train_model.m 第103行起

Param.meanshape        = S0(Param.ind_usedpts, :); %選取特定的landmarkdbsize = length(Data);% load('Ts_bbox.mat');augnumber = Param.augnumber; %為每張人臉選取的init_shape的個數for i = 1:dbsize        % initializ the shape of current face image by randomly selecting multiple shapes from other face images       % indice = ceil(dbsize*rand(1, augnumber));  indice_rotate = ceil(dbsize*rand(1, augnumber));  indice_shift  = ceil(dbsize*rand(1, augnumber));  scales        = 1 + 0.2*(rand([1 augnumber]) - 0.5);Data{i}.intermediate_shapes = cell(1, Param.max_numstage); %中間shapeData{i}.intermediate_bboxes = cell(1, Param.max_numstage);Data{i}.intermediate_shapes{1} = zeros([size(Param.meanshape), augnumber]); %68*2*augnumber(augnumber為第i圖片設置的初始shape的個數)Data{i}.intermediate_bboxes{1} = zeros([augnumber, size(Data{i}.bbox_gt, 2)]); %augnumber*4Data{i}.shapes_residual = zeros([size(Param.meanshape), augnumber]); %shapes_residual為shape 殘差 維數:68*2*augnumberData{i}.tf2meanshape = cell(augnumber, 1);Data{i}.meanshape2tf = cell(augnumber, 1);% if Data{i}.isdet == 1%    Data{i}.bbox_facedet = Data{i}.bbox_facedet*ts_bbox;% end     % 如下一段的意思是如果augnumber=1，表明每個圖片的Init_shape只有一個，因此這要設置成mean_shape即可,這時你會發現Data{i}.tf2meanshape{1}其實就是% 單位矩陣，因為他是從mean_shape轉化到mean_shape。后面就不一樣了.%；對于augnumber>1的其他init_shape將采用平移、旋轉、% 縮放等方式產生更多的shape，也可以從其他圖片的shape中挑選shapefor sr = 1:params.augnumberif sr == 1% estimate the similarity transformation from initial shape to mean shape% Data{i}.intermediate_shapes{1}(:,:, sr) = resetshape(Data{i}.bbox_gt, Param.meanshape);% Data{i}.intermediate_bboxes{1}(sr, :) = Data{i}.bbox_gt;Data{i}.intermediate_shapes{1}(:,:, sr) = resetshape(Data{i}.bbox_facedet, Param.meanshape);Data{i}.intermediate_bboxes{1}(sr, :) = Data{i}.bbox_facedet;%將mean shape reproject face detection bbox上meanshape_resize = resetshape(Data{i}.intermediate_bboxes{1}(sr, :), Param.meanshape); %meanshape_resize與 Data{i}.intermediate_shapes{1}(:,:, sr) 是相同的%計算當前的shape與mean shape之間的相似性變換         Data{i}.tf2meanshape{1} = fitgeotrans(bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, 1), mean(Data{i}.intermediate_shapes{1}(1:end,:, 1))), ...(bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :)))), 'NonreflectiveSimilarity');Data{i}.meanshape2tf{1} = fitgeotrans((bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :)))), ...bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, 1), mean(Data{i}.intermediate_shapes{1}(1:end,:, 1))), 'NonreflectiveSimilarity');% calculate the residual shape from initial shape to groundtruth shape under normalization scaleshape_residual = bsxfun(@rdivide, Data{i}.shape_gt - Data{i}.intermediate_shapes{1}(:,:, 1), [Data{i}.intermediate_bboxes{1}(1, 3) Data{i}.intermediate_bboxes{1}(1, 4)]);% transform the shape residual in the image coordinate to the mean shape coordinate[u, v] = transformPointsForward(Data{i}.tf2meanshape{1}, shape_residual(:, 1)', shape_residual(:, 2)'); Data{i}.shapes_residual(:, 1, 1) = u';Data{i}.shapes_residual(:, 2, 1) = v'; else% randomly rotate the shape            % shape = resetshape(Data{i}.bbox_gt, Param.meanshape);       % Data{indice_rotate(sr)}.shape_gtshape = resetshape(Data{i}.bbox_facedet, Param.meanshape);       % Data{indice_rotate(sr)}.shape_gt%根據隨機選取的scale，rotation，translate計算新的初始shape然后投影到bbox上if params.augnumber_scale ~= 0shape = scaleshape(shape, scales(sr));endif params.augnumber_rotate ~= 0shape = rotateshape(shape);endif params.augnumber_shift ~= 0shape = translateshape(shape, Data{indice_shift(sr)}.shape_gt);endData{i}.intermediate_shapes{1}(:, :, sr) = shape;Data{i}.intermediate_bboxes{1}(sr, :) = getbbox(shape);meanshape_resize = resetshape(Data{i}.intermediate_bboxes{1}(sr, :), Param.meanshape); %將Data{i}.tf2meanshape{sr} = fitgeotrans(bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, sr), mean(Data{i}.intermediate_shapes{1}(1:end,:, sr))), ...bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :))), 'NonreflectiveSimilarity');Data{i}.meanshape2tf{sr} = fitgeotrans(bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :))), ...bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, sr), mean(Data{i}.intermediate_shapes{1}(1:end,:, sr))), 'NonreflectiveSimilarity');shape_residual = bsxfun(@rdivide, Data{i}.shape_gt - Data{i}.intermediate_shapes{1}(:,:, sr), [Data{i}.intermediate_bboxes{1}(sr, 3) Data{i}.intermediate_bboxes{1}(sr, 4)]);[u, v] = transformPointsForward(Data{i}.tf2meanshape{1}, shape_residual(:, 1)', shape_residual(:, 2)');Data{i}.shapes_residual(:, 1, sr) = u';Data{i}.shapes_residual(:, 2, sr) = v';% Data{i}.shapes_residual(:, :, sr) = tformfwd(Data{i}.tf2meanshape{sr}, shape_residual(:, 1), shape_residual(:, 2));endend
end

這段代碼的理解需要結合上面給出的那篇文章《人臉配準坐標變換解析》。

按照《人臉配準坐標變換解析》文章所述，

S 0 ˉ ˉ ˉ ˉ S 1 ˉ ˉ ˉ ˉ = S 0 ? m e a n (S 0) = S 1 ? m e a n (S 1)} ? S 0 ˉ ˉ ˉ ˉ = c 1 R 1 S 1 ˉ ˉ ˉ ˉ

$\left.\begin{matrix}\overline{S_0}&=S_0-mean(S_0)\\ \overline{S_1}&=S_1-mean(S_1) \end{matrix}\right\}\Rightarrow \overline{S_0}=c_1R_1\overline{S_1}$
因此根據

Δ S = S g ? S 1

$\Delta S=S_g-S_1$ 可推出

Δ S ? = c 1 R 1 Δ S

$\widetilde{\Delta S}=c_1R_1\Delta S$
但是現在問題比較特殊，需要多操作一下:
由：

 %將mean shape reproject face detection bbox上meanshape_resize = resetshape(Data{i}.intermediate_bboxes{1}(sr, :), Param.meanshape);

查看resetshape的定義知meanshape被映射到intermediate_bboxes中，使得 $S_0$ 和 $S_1$ 處于同樣的尺度下和大致相似的位置上。用數學語言表達為:

S 0_r e s i z e = S 0 ? R a t i o + [R e g i o n (1), R e g i o n (2)]

$S_0\_resize=S_0*Ratio+[Region(1),Region(2)]$ 這里Ratio實際上是intermediate_bboxes的大小。
于是同樣按照上面的方法計算：

S 0 ? = S 0_R e s i z e ? m e a n (S 0_R e s i z e) = S 0 ? R a t i o ? m e a n (S 0) ? R a t i o = (S 0 ? m e a n (S 0)) ? R a t i o = S 0 ˉ ˉ ˉ ˉ ? R a t i o

$\widetilde{S_0}=S_0\_Resize-mean(S_0\_Resize)=S_0*Ratio-mean(S_0)*Ratio=(S_0-mean(S_0))*Ratio= \overline{S_0}*Ratio$
經過計算得

S0?=Ratio?S0ˉˉˉˉ=c1?R1?S1ˉˉˉˉ $\widetilde{S_0}=Ratio*\overline{S_0}=\widetilde{c_1}\widetilde{R_1} \overline{S_1}$ .（

★ $\bigstar$ ）
這也就是上面的代碼：

 Data{i}.tf2meanshape{1} = fitgeotrans(bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, 1), mean(Data{i}.intermediate_shapes{1}(1:end,:, 1))), ...(bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :)))), 'NonreflectiveSimilarity');

Data{i}.tf2meanshape{1}即為這里算出的 $\widetilde{c_1}\widetilde{R_1}$ .
但我們想要的是 $\overline{S_0}=c_1R_1\overline{S_1}$ ,不用著急，( $\bigstar$ )為我們指明了方向。
$c_1R_1=\widetilde{c_1}\widetilde{R_1}/Ratio=\widetilde{c_1}\widetilde{R_1}/{intermediate\_{bboxes}}$ .因此:

Δ S ? = c 1 ? R 1 ? / i n t e r m e d i a t e_b b o x e s ? Δ S

$\widetilde{\Delta S}=\widetilde{c_1}\widetilde{R_1}/{intermediate\_{bboxes}}*\Delta S$
也就是代碼中提的:

 %計算當前的shape與mean shape之間的相似性變換         
Data{i}.tf2meanshape{1} = fitgeotrans(bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, 1), mean(Data{i}.intermediate_shapes{1}(1:end,:, 1))),(bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :)))), 'NonreflectiveSimilarity');Data{i}.meanshape2tf{1} = fitgeotrans((bsxfun(@minus, meanshape_resize(1:end, :), mean(meanshape_resize(1:end, :)))),bsxfun(@minus, Data{i}.intermediate_shapes{1}(1:end,:, 1), mean(Data{i}.intermediate_shapes{1}(1:end,:, 1))), 'NonreflectiveSimilarity');% calculate the residual shape from initial shape to groundtruth shape under normalization scale
shape_residual = bsxfun(@rdivide, Data{i}.shape_gt - Data{i}.intermediate_shapes{1}(:,:, 1), [Data{i}.intermediate_bboxes{1}(1, 3) Data{i}.intermediate_bboxes{1}(1, 4)]);% transform the shape residual in the image coordinate to the mean shape coordinate
[u, v] = transformPointsForward(Data{i}.tf2meanshape{1}, shape_residual(:, 1)', shape_residual(:, 2)'); Data{i}.shapes_residual(:, 1, 1) = u';Data{i}.shapes_residual(:, 2, 1) = v';