Python深度學習基于Tensorflow（12）實戰生成式模型

文章目錄

- - - Deep Dream
    - 風格遷移
    - 參考資料

Deep Dream

DeepDream 是一項將神經網絡學習模式予以可視化展現的實驗。與孩子們觀察云朵并嘗試解釋隨機形狀相類似，DeepDream 會過度解釋并增強其在圖像中看到的圖案。

DeepDream為了說明CNN學習到的各特征的意義，將采用放大處理的方式。具體來說就是使用梯度上升的方法可視化網絡每一層的特征，即用一張噪聲圖像輸入網絡，反向更新的時候不更新網絡權重，而是更新初始圖像的像素值，以這種“訓練圖像”的方式可視化網絡。DeepDream正是以此為基礎。

DeepDream如何放大圖像特征？這里我們先看一個簡單實例。比如：有一個網絡學習了分類貓和狗的任務，給這個網絡一張云的圖像，這朵云可能比較像狗，那么機器提取的特征可能也會像狗。假設對應一個特征最后輸入概率為[0.6, 0.4], 0.6表示為狗的概率， 0.4表示為貓的概率，那么采用L2范數可以很好達到放大特征的效果。對于這樣一個特征，L2 =〖x1〗^2+〖x2〗2，若x1越大，x2越小，則L2越大，所以只需要最大化L2就能保證當x1>x2的時候，迭代的輪數越多x1越大，x2越小，所以圖像就會越來越像狗。每次迭代相當于計算L2范數，然后用梯度上升的方法調整圖像。優化的就不再是優化權重參數，而是特征值或像素點，因此，構建損失函數時，不使用通常的交叉熵，而是最大化特征值的L2范數。使圖片經過網絡之后提取的特征更像網絡隱含的特征。

使用基本圖像，它輸入到預訓練的CNN。然后，正向傳播到特定層。為了更好理解該層學到了什么，我們需要最大化通過該層激活值。以該層輸出為梯度，然后在輸入圖像上完成漸變上升，以最大化該層的激活值。不過，光這樣做并不能產生好的圖像。為了提高訓練質量，需要使用一些技術使得到的圖像更好。可以進行高斯模糊以使圖像更平滑，使用多尺度（又稱為八度）的圖片進行計算。先連續縮小輸入圖像，然后，再逐步放大，并將結果合并為一個圖像輸出。

首先使用預訓練模型 InceptionV3 對圖像特征進行提取，其中 mixed 表示的是 InceptionV3 中的 mixed 層的特征值；

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as pltlayer_coeff = {"mixed4": 1.0,"mixed5": 1.5,"mixed6": 2.0,"mixed7": 2.5,
}model = tf.keras.applications.inception_v3.InceptionV3(weights="imagenet", include_top=False)
outputs_dict = dict([(layer.name, layer.output) for layer in [model.get_layer(name) for name in layer_coeff.keys()]])
feature_extractor = tf.keras.Model(inputs=model.inputs, outputs=outputs_dict)

計算損失：

def compute_loss(input_image):features = feature_extractor(input_image)loss_list = []for name in features.keys():coeff = layer_settings[name]activation = features[name]# 通過僅在損失中包含非邊界像素來避免邊界偽影scaling = tf.reduce_prod(tf.cast(tf.shape(activation), "float32"))loss_list.append(coeff * tf.reduce_sum(tf.square(activation[:, 2:-2, 2:-2, :])) / scaling)return tf.reduce_sum(loss_list)

定義訓練函數：

@tf.function
def train_step(img, learning_rate=1e-1):with tf.GradientTape() as tape:tape.watch(img)loss = compute_loss(img)grads = tape.gradient(loss, img)grads /= tf.math.reduce_std(grads)img += learning_rate * gradsimg = tf.clip_by_value(img, -1, 1)return loss, imgdef train_loop(img, iterations, learning_rate=1e-1, max_loss=None):for i in range(iterations):loss, img = gradient_ascent_step(img, learning_rate)if max_loss is not None and loss > max_loss:breakreturn img

定義超參數：

# 縮放次數 多尺度次數 也即八度 每一次縮放 octave_scale
num_octave = 1
# 縮放倍數
octave_scale = 1.4
# train_loop 訓練迭代次數
iterations = 80
# 最大損失
max_loss = 15
# 學習率
learning_rate = 1e-2

如下便是多尺度縮放的訓練過程：

![[Pasted image 20240520015509.png]]

定義數據：

img = preprocess_image('./dog.jpg')
plt.imshow(deprocess(img[0]))

![[Pasted image 20240520015056.png]]

開始訓練：

original_img = preprocess_image('./dog.jpg')
original_shape = original_img.shape[1:3]successive_shapes = [original_shape]
for i in range(1, num_octave):shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]shrunk_original_img = tf.image.resize(original_img, successive_shapes[0])img = tf.identity(original_img)  # Make a copy
for i, shape in enumerate(successive_shapes):print("Processing octave %d with shape %s" % (i, shape))img = tf.image.resize(img, shape)img = train_loop(img, iterations=iterations, learning_rate=learning_rate, max_loss=max_loss)upscaled_shrunk_original_img = tf.image.resize(shrunk_original_img, shape)same_size_original = tf.image.resize(original_img, shape)lost_detail = same_size_original - upscaled_shrunk_original_imgimg += lost_detailshrunk_original_img = tf.image.resize(original_img, shape)tf.keras.preprocessing.image.save_img('./dream-' + "dog.jpg", deprocess(img[0]))

![[Pasted image 20240520014920.png]]

總的來說，Deep Dream 相當于訓練可視化，其不對參數進行梯度更新，而是對圖像進行梯度更新，通過梯度上升讓圖像能夠最大程度的激活目標層的輸出結果；其模型實際意義不強，有稍微的模型解釋性；

風格遷移

風格遷移的本質和 Deep Dream 是一樣的，其主要還是因為風格轉換涉及到的樣本數量太少，基本就是兩張圖片之間進行轉化，因此對參數進行梯度更新是不現實的，我們只能利用預訓練模型，提取圖片特征然后定義特征之間的損失進而進行操作；實現風格遷移的核心思想就是定義損失函數。

風格遷移的損失函數由內容損失和風格損失組成，這里用 $O_{image}$ 表示原圖， $R_{image}$ 表示風格圖， $G_{image}$ 表示生成圖，那么損失如下： $\mathcal{L} = distance(style(R_{image}) - style(G_{image})) + distance(content(O_{image}) - content(G_{image}))$
卷積神經網絡不同層學到的圖像特征是不一樣的，靠近輸入端的卷積層學到的是圖像比較具體，局部的特征，如位置，形狀，顏色，紋理等。靠近輸出端的卷積層學到的是圖像更全面，更抽象的特征，但會丟失圖像的一些詳細信息；

風格損失

風格損失是利用 Gram矩陣 來計算的，Gram矩陣 將圖像的通道作為一個維度，將圖像的寬和高合并作為一個維度，得到 $X$ 的尺寸為 $[c hann e l, w ? h]$ ，然后計算 $\cdot X^T$ ，用該值來衡量風格；

@tf.function
def gram_matrix(image):image = tf.transpose(image, (2, 0, 1))image = tf.reshape(image, [tf.shape(image)[0], -1])gram = tf.matmul(image, image, transpose_b=True)return gram@tf.function
def compute_style_loss(r_image, g_image):r_w, r_h, r_c = tf.shape(r_image)g_w, g_h, g_c = tf.shape(g_image)r_gram = gram_matrix(r_image)g_gram = gram_matrix(g_image)style_loss = tf.reduce_sum(tf.square(r_gram - g_gram))/  (4 * (r_c * g_c) * (r_w * r_h * g_w * g_h))

內容損失

內容損失很簡單，也就是生成圖像和原來圖像之間的區別；

@tf.function
def compute_content_loss(o_image, g_image):return tf.reduce_sum(tf.square(o_image - g_image))

這里不需要放縮是因為沒有像風格損失一樣經歷過 Gram矩陣 計算，這就導致原本的內容并沒有經過擴大，不過后面同樣會給內容損失和風格損失分配權重；

總損失

總損失讓生成的圖像具有連續性，不要這里一塊那里一塊；

def compute_variation_loss(x):a = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, 1:, :tf.shape(x)[2]-1, :])b = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, :tf.shape(x)[1]-1, 1:, :])return tf.reduce_sum(tf.pow(a+b, 1.25))

這里還是以上面的小狗圖片作為原圖片，風格圖片采取梵高的星空圖片；

![[Pasted image 20240520202401.png]]

首先導入預訓練模型 VGG19，以及圖像處理函數 preprocess_image deprocess_image；

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as pltdef preprocess_image(image_path):img = tf.keras.preprocessing.image.load_img(image_path, target_size=(400, 600))img = tf.keras.preprocessing.image.img_to_array(img)img = np.expand_dims(img, axis=0)img = tf.keras.applications.vgg19.preprocess_input(img)return tf.convert_to_tensor(img)def deprocess_image(x):x = x.reshape((400, 600, 3))x[:, :, 0] += 103.939x[:, :, 1] += 116.779x[:, :, 2] += 123.68x = x[:, :, ::-1]x = np.clip(x, 0, 255).astype("uint8")return x# 用于風格損失的網絡層列表
style_layer_names = ["block1_conv1","block2_conv1","block3_conv1","block4_conv1","block5_conv1",
]
# 用于內容損失的網絡層
content_layer_names = ["block5_conv2",
]model = tf.keras.applications.vgg19.VGG19(weights="imagenet", include_top=False)
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers if layer.name in style_layer_names + content_layer_names])
feature_extractor = tf.keras.Model(inputs=model.inputs, outputs=outputs_dict)

定義三個損失：compute_style_loss compute_content_loss compute_variation_loss

def gram_matrix(image):image = tf.transpose(image, (2, 0, 1))image = tf.reshape(image, [tf.shape(image)[0], -1])gram = tf.matmul(image, image, transpose_b=True)return gramdef compute_style_loss(r_image, g_image):r_w, r_h, r_c = tf.cast(tf.shape(r_image)[0], tf.float32), tf.cast(tf.shape(r_image)[1], tf.float32), tf.cast(tf.shape(r_image)[2], tf.float32)g_w, g_h, g_c = tf.cast(tf.shape(g_image)[0], tf.float32), tf.cast(tf.shape(g_image)[1], tf.float32), tf.cast(tf.shape(g_image)[2], tf.float32)r_gram = gram_matrix(r_image)g_gram = gram_matrix(g_image)style_loss = tf.reduce_sum(tf.square(r_gram - g_gram))/  (4 * (r_c * g_c) * (r_w * r_h * g_w * g_h))return style_lossdef compute_content_loss(o_image, g_image):return tf.reduce_sum(tf.square(o_image - g_image))def compute_variation_loss(x):a = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, 1:, :tf.shape(x)[2]-1, :])b = tf.square(x[:, :tf.shape(x)[1]-1, :tf.shape(x)[2]-1, :] - x[:, :tf.shape(x)[1]-1, 1:, :])return tf.reduce_sum(tf.pow(a+b, 1.25))

定義損失比例以及總損失計算函數 compute_loss

total_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8def compute_loss(o_image, r_image, g_image):X = tf.concat([o_image, r_image, g_image], axis=0)features = feature_extractor(X)loss_list = []for content_layer_name in content_layer_names:temp = features[content_layer_name]o_image_ = temp[0,:,:,:]g_image_ = temp[2,:,:,:]loss = compute_content_loss(o_image_, g_image_)loss_list.append(loss*content_weight/len(content_layer_names))for style_layer_name in style_layer_names:temp = features[style_layer_name]r_image_ = temp[1,:,:,:]g_image_ = temp[2,:,:,:]loss = compute_style_loss(r_image_, g_image_)loss_list.append(loss*style_weight/len(style_layer_names))loss = compute_variation_loss(g_image)loss_list.append(loss*total_weight)return tf.reduce_sum(loss_list)

定義優化器，圖片以及開始訓練：

o_image = preprocess_image('./dog.jpg')
r_image = preprocess_image('./start-night.png')
g_image = tf.Variable(o_image)optimizer = tf.keras.optimizers.Adam(learning_rate=1)def train_step():with tf.GradientTape() as tape:loss = compute_loss(o_image, r_image, g_image)grads = tape.gradient(loss, g_image)optimizer.apply_gradients([(grads, g_image)])return lossfor epoch in range(100):plt.imshow(deprocess_image(g_image.numpy()))plt.axis('off')plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))plt.show()tf.print(train_step())

最后將生成的圖片轉化為 GIF；

import imageio
from PIL import Image
import os
import numpy as np# 這里與 for epoch in range(100): 中的圖片名稱對應 image_at_epoch_{:04d}.png
converted_images = [np.array(Image.open(item)) for item in [file for file in os.listdir('./') if file.startswith('image')]]
imageio.mimsave("animation.gif", converted_images, fps=15)