GAN 這個領域發展太快,日新月異,各種 GAN 層出不窮,前幾天看到一篇關于 Wasserstein GAN 的文章,講的很好,在此把它分享出來一起學習:https://zhuanlan.zhihu.com/p/25071913。相比?Wasserstein GAN ,我們的 DCGAN 好像低了一個檔次,但是我們偉大的教育家魯迅先生說過:“合抱之木,生于毫末;九層之臺,起于累土;千里之行,始于足下”,(依稀記得那大概是我 7 - 8 歲的時候,魯迅先生依偎在我身旁,帶著和藹可親切的口吻對我說的這句話,他當時還加了一句話,小伙子你要記住,如果一句名言,你不知道是誰說的,那就是魯迅說的)。所以我們的基礎還是要打好的, DCGAN 是我們的基礎,有了 DCGAN 的代碼經驗,相信寫起?Wasserstein GAN 就順手很多,所以,我們接下來繼續來研究我們的無約束條件 DCGAN。
在上一篇文章中,我們用 MNIST 手寫字符訓練 GAN,生成網絡 G 生成了相對比較好的手寫字符,這一次,我們換個數據集,用 CelebA 人臉數據集來訓練我們的 GAN,相比于手寫字符,人臉數據集的分布更加復雜多樣,長頭發短頭發,黃種人黑種人,戴眼鏡不戴眼鏡,男人女人等等,看看我們的生成網絡 G 能否成功的檢驗出人臉數據集的分布。
首先準備數據:從官網分享的百度云盤連接 https://pan.baidu.com/s/1eSNpdRG#list/path=%2FCelebA%2FImg 下載 img_align_celeba.zip,在 /home/your_name/TensorFlow/DCGAN/data 文件夾下解壓,得到 img_align_celeba 文件夾,里面有 20600 張人臉圖片,在 /home/your_name/TensorFlow/DCGAN/data 文件夾下新建 img_align_celeba_tfrecords 文件夾,用來存放 tfrecords 文件,然后,在 /home/your_name/TensorFlow/DCGAN/ 下新建 convert_data.py,編寫如下的代碼,把人臉圖片轉化成 tfrecords 形式:
import os import time from PIL import Imageimport tensorflow as tf# 將圖片裁剪為 128 x 128 OUTPUT_SIZE = 128 # 圖片通道數,3 表示彩色 DEPTH = 3def _int64_feature(value):return tf.train.Feature(int64_list = tf.train.Int64List(value = [value])) def _bytes_feature(value):return tf.train.Feature(bytes_list = tf.train.BytesList(value = [value]))def convert_to(data_path, name):"""Converts s dataset to tfrecords"""rows = 64cols = 64depth = DEPTH# 循環 12 次,產生 12 個 .tfrecords 文件for ii in range(12):writer = tf.python_io.TFRecordWriter(name + str(ii) + '.tfrecords')# 每個 tfrecord 文件有 16384 個圖片for img_name in os.listdir(data_path)[ii*16384 : (ii+1)*16384]:# 打開圖片img_path = data_path + img_nameimg = Image.open(img_path)# 設置裁剪參數h, w = img.size[:2]j, k = (h - OUTPUT_SIZE) / 2, (w - OUTPUT_SIZE) / 2box = (j, k, j + OUTPUT_SIZE, k+ OUTPUT_SIZE)# 裁剪圖片img = img.crop(box = box)# image resizeimg = img.resize((rows,cols))# 轉化為字節img_raw = img.tobytes()# 寫入到 Example example = tf.train.Example(features = tf.train.Features(feature = {'height': _int64_feature(rows),'width': _int64_feature(cols),'depth': _int64_feature(depth),'image_raw': _bytes_feature(img_raw)}))writer.write(example.SerializeToString())writer.close()if __name__ == '__main__':current_dir = os.getcwd() data_path = current_dir + '/data/img_align_celeba/'name = current_dir + '/data/img_align_celeba_tfrecords/train'start_time = time.time() print('Convert start') print('\n' * 2)convert_to(data_path, name)print('\n' * 2)print('Convert done, take %.2f seconds' % (time.time() - start_time))
運行之后,在?/home/your_name/TensorFlow/DCGAN/data/img_align_celeba_tfrecords/ 下會產生 12 個 .tfrecords 文件,這就是我們要的數據格式。
?
數據準備好之后,根據前面的經驗,我們來寫無約束條件的 DCGAN 代碼,在 /home/your_name/TensorFlow/DCGAN/ 新建 none_cond_DCGAN.py 文件敲寫代碼,為了簡便起見,代碼中沒有加注釋并且把所有的代碼總結到一個代碼中,從代碼中可以看到,我們自己寫了一個 batch_norm 層,解決了 evaluation 函數中 is_train = False 的問題,并且可以斷點續訓練(只需要將開頭的 LOAD_MODEL 設置為 True);此外該程序在開頭采用很多的宏定義,可以方便的改為 tf.app.flags 定義的命令行參數,進而在命令行終端進行訓練,還可以進行類的拓展,例如:
?
class DCGAN(object):def __init__(self):self.BATCH_SIZE = 64...def bias(self):......
?
關于類的拓展,這里不做過多說明。
?
在 none_cond_DCGAN.py 文件中敲寫如下代碼:


import os import numpy as np import scipy.misc import tensorflow as tfBATCH_SIZE = 64 OUTPUT_SIZE = 64 GF = 64 # Dimension of G filters in first conv layer. default [64] DF = 64 # Dimension of D filters in first conv layer. default [64] Z_DIM = 100 IMAGE_CHANNEL = 3 LR = 0.0002 # Learning rate EPOCH = 5 LOAD_MODEL = False # Whether or not continue train from saved model。 TRAIN = True CURRENT_DIR = os.getcwd()def bias(name, shape, bias_start = 0.0, trainable = True):dtype = tf.float32var = tf.get_variable(name, shape, tf.float32, trainable = trainable, initializer = tf.constant_initializer(bias_start, dtype = dtype))return vardef weight(name, shape, stddev = 0.02, trainable = True):dtype = tf.float32var = tf.get_variable(name, shape, tf.float32, trainable = trainable, initializer = tf.random_normal_initializer(stddev = stddev, dtype = dtype))return vardef fully_connected(value, output_shape, name = 'fully_connected', with_w = False):shape = value.get_shape().as_list()with tf.variable_scope(name):weights = weight('weights', [shape[1], output_shape], 0.02)biases = bias('biases', [output_shape], 0.0)if with_w:return tf.matmul(value, weights) + biases, weights, biaseselse:return tf.matmul(value, weights) + biasesdef lrelu(x, leak=0.2, name = 'lrelu'):with tf.variable_scope(name):return tf.maximum(x, leak*x, name = name)def relu(value, name = 'relu'):with tf.variable_scope(name):return tf.nn.relu(value)def deconv2d(value, output_shape, k_h = 5, k_w = 5, strides =[1, 2, 2, 1], name = 'deconv2d', with_w = False):with tf.variable_scope(name):weights = weight('weights', [k_h, k_w, output_shape[-1], value.get_shape()[-1]])deconv = tf.nn.conv2d_transpose(value, weights, output_shape, strides = strides)biases = bias('biases', [output_shape[-1]])deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())if with_w:return deconv, weights, biaseselse:return deconvdef conv2d(value, output_dim, k_h = 5, k_w = 5, strides =[1, 2, 2, 1], name = 'conv2d'):with tf.variable_scope(name):weights = weight('weights', [k_h, k_w, value.get_shape()[-1], output_dim])conv = tf.nn.conv2d(value, weights, strides = strides, padding = 'SAME')biases = bias('biases', [output_dim])conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())return convdef conv_cond_concat(value, cond, name = 'concat'):"""Concatenate conditioning vector on feature map axis."""value_shapes = value.get_shape().as_list()cond_shapes = cond.get_shape().as_list()with tf.variable_scope(name): return tf.concat(3,[value, cond * tf.ones(value_shapes[0:3] + cond_shapes[3:])])def batch_norm(value, is_train = True, name = 'batch_norm', epsilon = 1e-5, momentum = 0.9):with tf.variable_scope(name):ema = tf.train.ExponentialMovingAverage(decay = momentum)shape = value.get_shape().as_list()[-1]beta = bias('beta', [shape], bias_start = 0.0)gamma = bias('gamma', [shape], bias_start = 1.0)if is_train:batch_mean, batch_variance = tf.nn.moments(value, [0, 1, 2], name = 'moments')moving_mean = bias('moving_mean', [shape], 0.0, False)moving_variance = bias('moving_variance', [shape], 1.0, False)ema_apply_op = ema.apply([batch_mean, batch_variance])assign_mean = moving_mean.assign(ema.average(batch_mean))assign_variance = \moving_variance.assign(ema.average(batch_variance))with tf.control_dependencies([ema_apply_op]):mean, variance = \tf.identity(batch_mean), tf.identity(batch_variance)with tf.control_dependencies([assign_mean, assign_variance]):return tf.nn.batch_normalization(value, mean, variance, beta, gamma, 1e-5)else:mean = bias('moving_mean', [shape], 0.0, False)variance = bias('moving_variance', [shape], 1.0, False)return tf.nn.batch_normalization(value, mean, variance, beta, gamma, epsilon)def generator(z, is_train = True, name = 'generator'):with tf.name_scope(name):s2, s4, s8, s16 = \OUTPUT_SIZE/2, OUTPUT_SIZE/4, OUTPUT_SIZE/8, OUTPUT_SIZE/16h1 = tf.reshape(fully_connected(z, GF*8*s16*s16, 'g_fc1'), [-1, s16, s16, GF*8], name = 'reshap')h1 = relu(batch_norm(h1, name = 'g_bn1', is_train = is_train))h2 = deconv2d(h1, [BATCH_SIZE, s8, s8, GF*4], name = 'g_deconv2d1')h2 = relu(batch_norm(h2, name = 'g_bn2', is_train = is_train))h3 = deconv2d(h2, [BATCH_SIZE, s4, s4, GF*2], name = 'g_deconv2d2')h3 = relu(batch_norm(h3, name = 'g_bn3', is_train = is_train))h4 = deconv2d(h3, [BATCH_SIZE, s2, s2, GF*1], name = 'g_deconv2d3')h4 = relu(batch_norm(h4, name = 'g_bn4', is_train = is_train))h5 = deconv2d(h4, [BATCH_SIZE, OUTPUT_SIZE, OUTPUT_SIZE, 3], name = 'g_deconv2d4') return tf.nn.tanh(h5)def discriminator(image, reuse = False, name = 'discriminator'):with tf.name_scope(name): if reuse:tf.get_variable_scope().reuse_variables()h0 = lrelu(conv2d(image, DF, name='d_h0_conv'), name = 'd_h0_lrelu')h1 = lrelu(batch_norm(conv2d(h0, DF*2, name='d_h1_conv'),name = 'd_h1_bn'), name = 'd_h1_lrelu')h2 = lrelu(batch_norm(conv2d(h1, DF*4, name='d_h2_conv'),name = 'd_h2_bn'), name = 'd_h2_lrelu')h3 = lrelu(batch_norm(conv2d(h2, DF*8, name='d_h3_conv'),name = 'd_h3_bn'), name = 'd_h3_lrelu')h4 = fully_connected(tf.reshape(h3, [BATCH_SIZE, -1]), 1, 'd_h4_fc')return tf.nn.sigmoid(h4), h4def sampler(z, is_train = False, name = 'sampler'):with tf.name_scope(name):tf.get_variable_scope().reuse_variables()return generator(z, is_train = is_train)def read_and_decode(filename_queue):"""read and decode tfrecords"""reader = tf.TFRecordReader()_, serialized_example = reader.read(filename_queue)features = tf.parse_single_example(serialized_example,features = {'image_raw':tf.FixedLenFeature([], tf.string)})image = tf.decode_raw(features['image_raw'], tf.uint8)image = tf.reshape(image, [OUTPUT_SIZE, OUTPUT_SIZE, 3])image = tf.cast(image, tf.float32)image = image / 255.0return imagedef inputs(data_dir, batch_size, name = 'input'):"""Reads input data num_epochs times."""with tf.name_scope(name):filenames = [os.path.join(data_dir,'train%d.tfrecords' % ii) for ii in range(12)]filename_queue = tf.train.string_input_producer(filenames)image = read_and_decode(filename_queue)images = tf.train.shuffle_batch([image], batch_size = batch_size, num_threads = 4, capacity = 20000 + 3 * batch_size, min_after_dequeue = 20000)return imagesdef save_images(images, size, path):"""Save the samples imagesThe best size number isint(max(sqrt(image.shape[1]),sqrt(image.shape[1]))) + 1"""img = (images + 1.0) / 2.0h, w = img.shape[1], img.shape[2]merge_img = np.zeros((h * size[0], w * size[1], 3))for idx, image in enumerate(images):i = idx % size[1]j = idx // size[1]merge_img[j*h:j*h+h, i*w:i*w+w, :] = imagereturn scipy.misc.imsave(path, merge_img) def train():global_step = tf.Variable(0, name = 'global_step', trainable = False)train_dir = CURRENT_DIR + '/logs_without_condition/'data_dir = CURRENT_DIR + '/data/img_align_celeba_tfrecords/'images = inputs(data_dir, BATCH_SIZE)z = tf.placeholder(tf.float32, [None, Z_DIM], name='z')G = generator(z)D, D_logits = discriminator(images)samples = sampler(z)D_, D_logits_ = discriminator(G, reuse = True)d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(D_logits, tf.ones_like(D)))d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.zeros_like(D_)))d_loss = d_loss_real + d_loss_fakeg_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.ones_like(D_)))z_sum = tf.histogram_summary('z', z)d_sum = tf.histogram_summary('d', D)d__sum = tf.histogram_summary('d_', D_)G_sum = tf.image_summary('G', G)d_loss_real_sum = tf.scalar_summary('d_loss_real', d_loss_real)d_loss_fake_sum = tf.scalar_summary('d_loss_fake', d_loss_fake)d_loss_sum = tf.scalar_summary('d_loss', d_loss) g_loss_sum = tf.scalar_summary('g_loss', g_loss)g_sum = tf.merge_summary([z_sum, d__sum, G_sum, d_loss_fake_sum, g_loss_sum])d_sum = tf.merge_summary([z_sum, d_sum, d_loss_real_sum, d_loss_sum])t_vars = tf.trainable_variables()d_vars = [var for var in t_vars if 'd_' in var.name]g_vars = [var for var in t_vars if 'g_' in var.name]saver = tf.train.Saver()d_optim = tf.train.AdamOptimizer(LR, beta1 = 0.5) \.minimize(d_loss, var_list = d_vars, global_step = global_step)g_optim = tf.train.AdamOptimizer(LR, beta1 = 0.5) \.minimize(g_loss, var_list = g_vars, global_step = global_step)os.environ['CUDA_VISIBLE_DEVICES'] = str(0)config = tf.ConfigProto()config.gpu_options.per_process_gpu_memory_fraction = 0.2sess = tf.InteractiveSession(config=config)writer = tf.train.SummaryWriter(train_dir, sess.graph) sample_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, Z_DIM))coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess = sess, coord = coord)init = tf.initialize_all_variables() sess.run(init)start = 0if LOAD_MODEL: print(" [*] Reading checkpoints...")ckpt = tf.train.get_checkpoint_state(train_dir) if ckpt and ckpt.model_checkpoint_path:ckpt_name = os.path.basename(ckpt.model_checkpoint_path)saver.restore(sess, os.path.join(train_dir, ckpt_name))global_step = ckpt.model_checkpoint_path.split('/')[-1]\.split('-')[-1]print('Loading success, global_step is %s' % global_step)start = int(global_step)for epoch in range(EPOCH):batch_idxs = 3072if epoch:start = 0for idx in range(start, batch_idxs):batch_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, Z_DIM))_, summary_str = sess.run([d_optim, d_sum], feed_dict = {z: batch_z})writer.add_summary(summary_str, idx+1)# Update G network_, summary_str = sess.run([g_optim, g_sum], feed_dict = {z: batch_z})writer.add_summary(summary_str, idx+1)# Run g_optim twice to make sure that d_loss does not go to zero_, summary_str = sess.run([g_optim, g_sum], feed_dict = {z: batch_z})writer.add_summary(summary_str, idx+1)errD_fake = d_loss_fake.eval({z: batch_z})errD_real = d_loss_real.eval()errG = g_loss.eval({z: batch_z})if idx % 20 == 0:print("[%4d/%4d] d_loss: %.8f, g_loss: %.8f" \% (idx, batch_idxs, errD_fake+errD_real, errG))if idx % 100 == 0:sample = sess.run(samples, feed_dict = {z: sample_z})samples_path = CURRENT_DIR + '/samples_without_condition/'save_images(sample, [8, 8], samples_path + \'sample_%d_epoch_%d.png' % (epoch, idx))print '\n'*2print('=========== %d_epoch_%d.png save down ===========' %(epoch, idx))print '\n'*2if (idx % 512 == 0) or (idx + 1 == batch_idxs):checkpoint_path = os.path.join(train_dir, 'my_dcgan_tfrecords.ckpt')saver.save(sess, checkpoint_path, global_step = idx+1)print '********* model saved *********'print '******* start with %d *******' % startcoord.request_stop() coord.join(threads)sess.close()def evaluate():eval_dir = CURRENT_DIR + '/eval/'checkpoint_dir = CURRENT_DIR + '/logs_without_condition/'z = tf.placeholder(tf.float32, [None, Z_DIM], name='z')G = generator(z, is_train = False)sample_z1 = np.random.uniform(-1, 1, size=(BATCH_SIZE, Z_DIM))sample_z2 = np.random.uniform(-1, 1, size=(BATCH_SIZE, Z_DIM))sample_z3 = (sample_z1 + sample_z2) / 2sample_z4 = (sample_z1 + sample_z3) / 2sample_z5 = (sample_z2 + sample_z3) / 2 print("Reading checkpoints...")ckpt = tf.train.get_checkpoint_state(checkpoint_dir)saver = tf.train.Saver(tf.all_variables())os.environ['CUDA_VISIBLE_DEVICES'] = str(0)config = tf.ConfigProto()config.gpu_options.per_process_gpu_memory_fraction = 0.2sess = tf.InteractiveSession(config=config)if ckpt and ckpt.model_checkpoint_path:ckpt_name = os.path.basename(ckpt.model_checkpoint_path)global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] saver.restore(sess, os.path.join(checkpoint_dir, ckpt_name))print('Loading success, global_step is %s' % global_step)eval_sess1 = sess.run(G, feed_dict = {z: sample_z1})eval_sess2 = sess.run(G, feed_dict = {z: sample_z4})eval_sess3 = sess.run(G, feed_dict = {z: sample_z3})eval_sess4 = sess.run(G, feed_dict = {z: sample_z5})eval_sess5 = sess.run(G, feed_dict = {z: sample_z2})print(eval_sess3.shape)save_images(eval_sess1, [8, 8], eval_dir + 'eval_%d.png' % 1)save_images(eval_sess2, [8, 8], eval_dir + 'eval_%d.png' % 2)save_images(eval_sess3, [8, 8], eval_dir + 'eval_%d.png' % 3)save_images(eval_sess4, [8, 8], eval_dir + 'eval_%d.png' % 4)save_images(eval_sess5, [8, 8], eval_dir + 'eval_%d.png' % 5)sess.close()if __name__ == '__main__':if TRAIN:train()else:evaluate()
?
完成后,運行代碼,網絡開始訓練,大致需要 1~2 個小時,訓練就可以完成,在訓練的過程中,可以看出 sampler 采樣的生成結果越來越好,最后得到了一個如下圖所示的結果,由于人臉的數據分布比手寫數據分布復雜多樣,所以生成器不能完全抓住人臉的特征,下圖所示的第 6 行第 7 列就是一個很糟糕的生成圖像。
?
?
訓練完成后,我們用 tensorboard 打開網絡的 graph,看看經過我們的精心設計,網絡結構變成了什么樣子:
?
?可以看出來,這次的結構圖,比之前的順眼多了,簡直是處女座的福音啊有木有。
?
至此,我們完成了 DCGAN 的代碼,下一篇文章,我們來說說 Caffe 那點事。
?
?
參考文獻:
1.?https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/how_tos/reading_data/convert_to_records.py
2.?https://github.com/carpedm20/DCGAN-tensorflow
?