文章目錄
- Deep Dream
- 風格遷移
- 參考資料
Deep Dream
DeepDream 是一項將神經網絡學習模式予以可視化展現的實驗。與孩子們觀察云朵并嘗試解釋隨機形狀相類似,DeepDream 會過度解釋并增強其在圖像中看到的圖案。
DeepDream為了說明CNN學習到的各特征的意義,將采用放大處理的方式。具體來說就是使用梯度上升的方法可視化網絡每一層的特征,即用一張噪聲圖像輸入網絡,反向更新的時候不更新網絡權重,而是更新初始圖像的像素值,以這種“訓練圖像”的方式可視化網絡。DeepDream正是以此為基礎。
DeepDream如何放大圖像特征?這里我們先看一個簡單實例。比如:有一個網絡學習了分類貓和狗的任務,給這個網絡一張云的圖像,這朵云可能比較像狗,那么機器提取的特征可能也會像狗。假設對應一個特征最后輸入概率為[0.6, 0.4], 0.6表示為狗的概率, 0.4表示為貓的概率,那么采用L2范數可以很好達到放大特征的效果。對于這樣一個特征,L2 =〖x1〗2+〖x2〗2,若x1越大,x2越小,則L2越大,所以只需要最大化L2就能保證當x1>x2的時候,迭代的輪數越多x1越大,x2越小,所以圖像就會越來越像狗。每次迭代相當于計算L2范數,然后用梯度上升的方法調整圖像。優化的就不再是優化權重參數,而是特征值或像素點,因此,構建損失函數時,不使用通常的交叉熵,而是最大化特征值的L2范數。使圖片經過網絡之后提取的特征更像網絡隱含的特征。
使用基本圖像,它輸入到預訓練的CNN。 然后,正向傳播到特定層。為了更好理解該層學到了什么,我們需要最大化通過該層激活值。以該層輸出為梯度,然后在輸入圖像上完成漸變上升,以最大化該層的激活值。不過,光這樣做并不能產生好的圖像。為了提高訓練質量,需要使用一些技術使得到的圖像更好。可以進行高斯模糊以使圖像更平滑,使用多尺度(又稱為八度)的圖片進行計算。先連續縮小輸入圖像,然后,再逐步放大,并將結果合并為一個圖像輸出。
首先使用預訓練模型 InceptionV3
對圖像特征進行提取,其中 mixed
表示的是 InceptionV3
中的 mixed
層的特征值;
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as pltlayer_coeff = {"mixed4": 1.0,"mixed5": 1.5,"mixed6": 2.0,"mixed7": 2.5,
}model = tf.keras.applications.inception_v3.InceptionV3(weights="imagenet", include_top=False)
outputs_dict = dict([(layer.name, layer.output) for layer in [model.get_layer(name) for name in layer_coeff.keys()]])
feature_extractor = tf.keras.Model(inputs=model.inputs, outputs=outputs_dict)
計算損失:
def compute_loss(input_image):features = feature_extractor(input_image)loss_list = []for name in features.keys():coeff = layer_settings[name]activation = features[name]# 通過僅在損失中包含非邊界像素來避免邊界偽影scaling = tf.reduce_prod(tf.cast(tf.shape(activation), "float32"))loss_list.append(coeff * tf.reduce_sum(tf.square(activation[:, 2:-2, 2:-2, :])) / scaling)return tf.reduce_sum(loss_list)
定義訓練函數:
@tf.function
def train_step(img, learning_rate=1e-1):with tf.GradientTape() as tape:tape.watch(img)loss = compute_loss(img)grads = tape.gradient(loss, img)grads /= tf.math.reduce_std(grads)img += learning_rate * gradsimg = tf.clip_by_value(img, -1, 1)return loss, imgdef train_loop(img, iterations, learning_rate=1e-1, max_loss=None):for i in range(iterations):loss, img = gradient_ascent_step(img, learning_rate)if max_loss is not None and loss > max_loss:breakreturn img
定義超參數:
# 縮放次數 多尺度次數 也即八度 每一次縮放 octave_scale
num_octave = 1
# 縮放倍數
octave_scale = 1.4
# train_loop 訓練迭代次數
iterations = 80
# 最大損失
max_loss = 15
# 學習率
learning_rate = 1e-2
如下便是多尺度縮放的訓練過程:
定義數據:
img = preprocess_image('./dog.jpg')
plt.imshow(deprocess(img[0]))
開始訓練:
original_img = preprocess_image('./dog.jpg')
original_shape = original_img.shape[1:3]successive_shapes = [original_shape]
for i in range(1, num_octave):shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]shrunk_original_img = tf.image.resize(original_img, successive_shapes[0])img = tf.identity(original_img) # Make a copy
for i, shape in enumerate(successive_shapes):print("Processing octave %d with shape %s" % (i, shape))img = tf.image.resize(img, shape)img = train_loop(img, iterations=iterations, learning_rate=learning_rate, max_loss=max_loss)upscaled_shrunk_original_img = tf.image.resize(shrunk_original_img, shape)same_size_original = tf.image.resize(original_img, shape)lost_detail = same_size_original - upscaled_shrunk_original_imgimg += lost_detailshrunk_original_img = tf.image.resize(original_img, shape)tf.keras.preprocessing.image.save_img('./dream-' + "dog.jpg", deprocess(img[0]))
總的來說,Deep Dream 相當于訓練可視化,其不對參數進行梯度更新,而是對圖像進行梯度更新,通過梯度上升讓圖像能夠最大程度的激活目標層的輸出結果;其模型實際意義不強,有稍微的模型解釋性;
風格遷移
風格遷移的本質和 Deep Dream
是一樣的,其主要還是因為風格轉換涉及到的樣本數量太少,基本就是兩張圖片之間進行轉化,因此對參數進行梯度更新是不現實的,我們只能利用預訓練模型,提取圖片特征然后定義特征之間的損失進而進行操作;實現風格遷移的核心思想就是定義損失函數。
風格遷移的損失函數由內容損失和風格損失組成,這里用 O i m a g e O_{image} Oimage? 表示原圖, R i m a g e R_{image} Rimage? 表示風格圖, G i m a g e G_{image} Gimage? 表示生成圖,那么損失如下: L = d i s t a n c e ( s t y l e ( R i m a g e ) ? s t y l e ( G i m a g e ) ) + d i s t a n c e ( c o n t e n t ( O i m a g e ) ? c o n t e n t ( G i m a g e ) ) \mathcal{L} = distance(style(R_{image}) - style(G_{image})) + distance(content(O_{image}) - content(G_{image})) L=distance(style(Rimage?)?style(Gimage?))+distance(content(Oimage?)?content(Gimage?))
卷積神經網絡不同層學到的圖像特征是不一樣的,靠近輸入端的卷積層學到的是圖像比較具體,局部的特征,如位置,形狀,顏色,紋理等。靠近輸出端的卷積層學到的是圖像更全面,更抽象的特征,但會丟失圖像的一些詳細信息;
風格損失
風格損失是利用 Gram矩陣
來計算的,Gram矩陣
將圖像的通道作為一個維度,將圖像的寬和高合并作為一個維度,得到 X X X 的尺寸為 [ c h a n n e l , w ? h ] [channel, w*h] [channel,w?h],然后計算 X ? X T X \cdot X^T X?XT ,用該值來衡量風格;
@tf.function
def gram_matrix(image):image = tf.transpose(image, (2, 0, 1))image = tf.reshape(image, [tf.shape(image)[0], -1])gram = tf.matmul(image, image, transpose_b=True)return gram@tf.function
def compute_style_loss(r_image, g_image):r_w, r_h, r_c = tf.shape(r_image)g_w, g_h, g_c = tf.shape(g_image)r_gram = gram_matrix(r_image)g_gram = gram_matrix(g_image)style_loss = tf.reduce_sum(tf.square(r_gram - g_gram))/ (4 * (r_c * g_c) * (r_w * r_h * g_w * g_h))
內容損失
內容損失很簡單,也就是生成圖像和原來圖像之間的區別;
@tf.function
def compute_content_loss(o_image, g_image):return tf.reduce_sum(tf.square(o_image - g_image))
這里不需要放縮是因為沒有像風格損失一樣經歷過 Gram矩陣
計算,這就導致原本的內容并沒有經過擴大,不過后面同樣會給內容損失和風格損失分配權重;
總損失
總損失讓生成的圖像具有連續性,不要這里一塊那里一塊;
def compute_variation_loss(x):a = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, 1:, :tf.shape(x)[2]-1, :])b = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, :tf.shape(x)[1]-1, 1:, :])return tf.reduce_sum(tf.pow(a+b, 1.25))
這里還是以上面的小狗圖片作為原圖片,風格圖片采取梵高的星空圖片;
首先導入預訓練模型 VGG19
,以及圖像處理函數 preprocess_image
deprocess_image
;
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as pltdef preprocess_image(image_path):img = tf.keras.preprocessing.image.load_img(image_path, target_size=(400, 600))img = tf.keras.preprocessing.image.img_to_array(img)img = np.expand_dims(img, axis=0)img = tf.keras.applications.vgg19.preprocess_input(img)return tf.convert_to_tensor(img)def deprocess_image(x):x = x.reshape((400, 600, 3))x[:, :, 0] += 103.939x[:, :, 1] += 116.779x[:, :, 2] += 123.68x = x[:, :, ::-1]x = np.clip(x, 0, 255).astype("uint8")return x# 用于風格損失的網絡層列表
style_layer_names = ["block1_conv1","block2_conv1","block3_conv1","block4_conv1","block5_conv1",
]
# 用于內容損失的網絡層
content_layer_names = ["block5_conv2",
]model = tf.keras.applications.vgg19.VGG19(weights="imagenet", include_top=False)
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers if layer.name in style_layer_names + content_layer_names])
feature_extractor = tf.keras.Model(inputs=model.inputs, outputs=outputs_dict)
定義三個損失:compute_style_loss
compute_content_loss
compute_variation_loss
def gram_matrix(image):image = tf.transpose(image, (2, 0, 1))image = tf.reshape(image, [tf.shape(image)[0], -1])gram = tf.matmul(image, image, transpose_b=True)return gramdef compute_style_loss(r_image, g_image):r_w, r_h, r_c = tf.cast(tf.shape(r_image)[0], tf.float32), tf.cast(tf.shape(r_image)[1], tf.float32), tf.cast(tf.shape(r_image)[2], tf.float32)g_w, g_h, g_c = tf.cast(tf.shape(g_image)[0], tf.float32), tf.cast(tf.shape(g_image)[1], tf.float32), tf.cast(tf.shape(g_image)[2], tf.float32)r_gram = gram_matrix(r_image)g_gram = gram_matrix(g_image)style_loss = tf.reduce_sum(tf.square(r_gram - g_gram))/ (4 * (r_c * g_c) * (r_w * r_h * g_w * g_h))return style_lossdef compute_content_loss(o_image, g_image):return tf.reduce_sum(tf.square(o_image - g_image))def compute_variation_loss(x):a = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, 1:, :tf.shape(x)[2]-1, :])b = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, :tf.shape(x)[1]-1, 1:, :])return tf.reduce_sum(tf.pow(a+b, 1.25))
定義損失比例以及總損失計算函數 compute_loss
total_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8def compute_loss(o_image, r_image, g_image):X = tf.concat([o_image, r_image, g_image], axis=0)features = feature_extractor(X)loss_list = []for content_layer_name in content_layer_names:temp = features[content_layer_name]o_image_ = temp[0,:,:,:]g_image_ = temp[2,:,:,:]loss = compute_content_loss(o_image_, g_image_)loss_list.append(loss*content_weight/len(content_layer_names))for style_layer_name in style_layer_names:temp = features[style_layer_name]r_image_ = temp[1,:,:,:]g_image_ = temp[2,:,:,:]loss = compute_style_loss(r_image_, g_image_)loss_list.append(loss*style_weight/len(style_layer_names))loss = compute_variation_loss(g_image)loss_list.append(loss*total_weight)return tf.reduce_sum(loss_list)
定義優化器,圖片以及開始訓練:
o_image = preprocess_image('./dog.jpg')
r_image = preprocess_image('./start-night.png')
g_image = tf.Variable(o_image)optimizer = tf.keras.optimizers.Adam(learning_rate=1)def train_step():with tf.GradientTape() as tape:loss = compute_loss(o_image, r_image, g_image)grads = tape.gradient(loss, g_image)optimizer.apply_gradients([(grads, g_image)])return lossfor epoch in range(100):plt.imshow(deprocess_image(g_image.numpy()))plt.axis('off')plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))plt.show()tf.print(train_step())
最后將 生成的圖片 轉化為 GIF
;
import imageio
from PIL import Image
import os
import numpy as np# 這里與 for epoch in range(100): 中的圖片名稱對應 image_at_epoch_{:04d}.png
converted_images = [np.array(Image.open(item)) for item in [file for file in os.listdir('./') if file.startswith('image')]]
imageio.mimsave("animation.gif", converted_images, fps=15)
得到如下結果:
參考資料
DeepDream | TensorFlow Core (google.cn)
【數學-20】格拉姆矩陣(Gram matrix)詳細解讀 - 知乎 (zhihu.com)