GAN論文閱讀筆記
2014年老論文了,主要記錄一些重要的東西。論文鏈接如下:
Generative Adversarial Nets (neurips.cc)
文章目錄
- GAN論文閱讀筆記
- 出發點
- 創新點
- 設計
- 訓練代碼
- 網絡結構代碼
- 測試代碼
出發點
Deep generative models have had less of an impact, due to the difficulty of approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging the benefits of piecewise linear units in the generative context.
? 當時的生成模型效果不佳在于近似許多棘手的概率計算十分困難,如最大似然估計等。除此之外,把利用分段線性單元運用到生成場景中也有困難。于是作者提出新的生成模型:GAN。
? 我的理解是,當時的生成模型都是去學習模型生成數據的分布,比如確定方差,確定均值之類的參數,然而這種方法十分難以學習,而且計算量大而復雜,作者考慮到這一點,對生成模型采用端到端的學習策略,不去學習生成數據的分布,而是直接學習模型,只要這個模型的生成結果能夠逼近Ground-Truth,那么就可以直接用這個模型代替分布去生成數據。這是典型的黑箱思想。
創新點
adiscriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistiguishable from the genuine articles.
創新點1:提出對抗學習策略:提出兩個model之間相互對抗,相互抑制的策略。一個model名為生成器Generator,一個model名為判別器Discriminator,生成器盡可能生成接近真實的數據,判別器盡可能識別出生成器數據是Fake。
In this article, we explore the special case when the generative model generates samples by passing random noise through a multilayer perceptron, and the discriminative model is also a multilayer perceptron.
創新點2:當兩個model都使用神經網絡時,可以運用反向傳播和Dropout等算法進行學習,這樣就可以避免使用馬爾科夫鏈。
設計
To learn the generator’s distribution pgover data x, we define a prior on input noise variables pz(z), then represent a mapping to data space as G(z; θg), where G is a differentiable function represented by a multilayer perceptron with parameters θg. We also define a second multilayer perceptron D(x; θd) that outputs a single scalar. D(x) represents the probability that x came from the data rather than pg.
1.輸入:為了讓生成器G生成的數據分布pg與真實數據分布x接近,策略是給G輸入一個噪音變量z,然后學習參數θg,這個θg是G網絡權重。因此,G可以被寫作:G(z;θg)。
m i n G m a x D V ( D , G ) = E x ~ p d a t a ( x ) [ l o g D ( x ) ] + E z ~ p z ( z ) [ l o g ( 1 ? D ( G ( z ) ) ) ] \underset{G}{min}\underset{D}{max}V(D, G) =\mathbb{E}_{x \sim p_{data}(x)}\left[ logD(x)\right] + \mathbb{E}_{z \sim p_z(z)}\left[log(1 - D(G(z)))\right] Gmin?Dmax?V(D,G)=Ex~pdata?(x)?[logD(x)]+Ez~pz?(z)?[log(1?D(G(z)))]
2.對抗性損失函數:從代碼可知,對抗性損失是兩個BCELoss的和,V盡可能使D(x)更大,在此基礎上盡可能使G(z)更小。這是有先后順序的,在后面會做說明。
在代碼中可知,先人為生成兩個標簽,第一個標簽是用torch.ones
生成的全為1的矩陣,形狀為(batch,1)。其中batch是輸入噪聲的batch,第二維度只是一個數字——1,這個標簽用于判別器D的BCELoss中,代入BCELoss即可得到上面對抗性損失中左側的期望。第二個標簽是用torch.zeors
生成的全為0的矩陣,形狀同理為(batch,1),運用于生成器G的BCELoss中,代入即可得到對抗性損失的右側期望。
we alternate between k steps of optimizing D and one step of optimizing G.
This results in D being maintained near its optimal solution, so long as G changes slowly enough.
3.D與G的訓練有先后順序:判別器D先于生成器G訓練,而且要求先對D訓練k步,再為G訓練1步,這就保證G的訓練比D足夠慢。
如果生成器G足夠強大,那么判別器無法再監測生成器,也就沒有對抗的必要了。相反,如果判別器D太過于強大,那么生成器也訓練地十分緩慢。
4.算法圖如上。
訓練代碼
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torchvision.utils import save_image
from torch.utils.data import DataLoader
from Model import generator
from Model import discriminatorimport osif not os.path.exists('gan_train.py'): # 報錯中間結果os.mkdir('gan_train.py')def to_img(x): # 將結果的-0.5~0.5變為0~1保存圖片out = 0.5 * (x + 1)out = out.clamp(0, 1)out = out.view(-1, 1, 28, 28)return outbatch_size = 96
num_epoch = 200
z_dimension = 100# 數據預處理
img_transform = transforms.Compose([transforms.ToTensor(), # 圖像數據轉換成了張量,并且歸一化到了[0,1]。transforms.Normalize([0.5], [0.5]) # 這一句的實際結果是將[0,1]的張量歸一化到[-1, 1]上。前面的(0.5)均值, 后面(0.5)標準差,
])
# MNIST數據集
mnist = datasets.MNIST(root='./data', train=True, transform=img_transform, download=True)
# 數據集加載器
dataloader = torch.utils.data.DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True)D = discriminator() # 創建生成器
G = generator() # 創建判別器
if torch.cuda.is_available(): # 放入GPUD = D.cuda()G = G.cuda()criterion = nn.BCELoss() # BCELoss 因為可以當成是一個分類任務,如果后面不加Sigmod就用BCEWithLogitsLoss
d_optimizer = torch.optim.Adam(D.parameters(), lr=0.0003) # 優化器
g_optimizer = torch.optim.Adam(G.parameters(), lr=0.0003) # 優化器# 開始訓練
for epoch in range(num_epoch):for i, (img, _) in enumerate(dataloader): # img[96,1,28,28]G.train()num_img = img.size(0) # num_img=batchsize# =================train discriminatorimg = img.view(num_img, -1) # 把圖片拉平,為了輸入判別器 [96,784]real_img = img.cuda() # 裝進cuda,真實圖片real_label = torch.ones(num_img).reshape(num_img, 1).cuda() # 希望判別器對real_img輸出為1 [96,1]fake_label = torch.zeros(num_img).reshape(num_img, 1).cuda() # 希望判別器對fake_img輸出為0 [96,1]# 先訓練鑒別器# 計算真實圖片的lossreal_out = D(real_img) # 將真實圖片輸入鑒別器 [96,1]d_loss_real = criterion(real_out, real_label) # 希望real_out越接近1越好 [1]real_scores = real_out # 后面print用的# 計算生成圖片的lossz = torch.randn(num_img, z_dimension).cuda() # 創建一個100維度的隨機噪聲作為生成器的輸入 [96,1]# 這個z維度和生成器第一個Linear第一個參數一致# 避免計算G的梯度fake_img = G(z).detach() # 生成偽造圖片 [96,748]fake_out = D(fake_img) # 給判別器判斷生成的好不好 [96,1]d_loss_fake = criterion(fake_out, fake_label) # 希望判別器給fake_out越接近0越好 [1]fake_scores = fake_out # 后面print用的d_loss = d_loss_real + d_loss_faked_optimizer.zero_grad()d_loss.backward()d_optimizer.step()# 訓練生成器# 計算生成圖片的lossz = torch.randn(num_img, z_dimension).cuda() # 生成隨機噪聲 [96,100]fake_img = G(z) # 生成器偽造圖像 [96,784]output = D(fake_img) # 將偽造圖像給判別器判斷真偽 [96,1]g_loss = criterion(output, real_label) # 生成器希望判別器給的值越接近1越好 [1]# 更新生成器g_optimizer.zero_grad()g_loss.backward()g_optimizer.step()if (i + 1) % 100 == 0:print(f'Epoch [{epoch}/{num_epoch}], d_loss: {d_loss.cpu().detach():.6f}, g_loss: {g_loss.cpu().detach():.6f}',f'D real: {real_scores.cpu().detach().mean():.6f}, D fake: {fake_scores.cpu().detach().mean():.6f}')if epoch == 0: # 保存圖片real_images = to_img(real_img.detach().cpu())save_image(real_images, './img_gan/real_images.png')fake_images = to_img(fake_img.detach().cpu())save_image(fake_images, f'./img_gan/fake_images-{epoch + 1}.png')G.eval()with torch.no_grad():new_z = torch.randn(batch_size, 100).cuda()test_img = G(new_z)print(test_img.shape)test_img = to_img(test_img.detach().cpu())test_path = f'./test_result/the_{epoch}.png'save_image(test_img, test_path)# 保存模型
torch.save(G.state_dict(), './generator.pth')
torch.save(D.state_dict(), './discriminator.pth')
網絡結構代碼
import torch
from torch import nn# 判別器 判別圖片是不是來自MNIST數據集
class discriminator(nn.Module):def __init__(self):super(discriminator, self).__init__()self.dis = nn.Sequential(nn.Linear(784, 256), # 784=28*28nn.LeakyReLU(0.2),nn.Linear(256, 256),nn.LeakyReLU(0.2),nn.Linear(256, 1),nn.Sigmoid()# sigmoid輸出這個生成器是或不是原圖片,是二分類)def forward(self, x):x = self.dis(x)return x# 生成器 生成偽造的MNIST數據集
class generator(nn.Module):def __init__(self):super(generator, self).__init__()self.gen = nn.Sequential(nn.Linear(100, 256), # 輸入為100維的隨機噪聲nn.ReLU(),nn.Linear(256, 256),nn.ReLU(),nn.Linear(256, 784),# 生成器輸出的特征維和正常圖片一樣,這是一個可參考的點nn.Tanh())def forward(self, x):x = self.gen(x)return xclass FinetuneModel(nn.Module):def __init__(self, weights):super(FinetuneModel, self).__init__()self.G = generator()base_weights = torch.load(weights)model_parameters = dict(self.G.named_parameters())# 不是對model進行named_parameters,而是對model里面的具體網絡進行named_parameters取出參數,否則取出的是model冗余的參數去測試pretrained_weights = {k: v for k, v in base_weights.items() if k in model_parameters}new_state_dict = {k: pretrained_weights[k] for k in model_parameters.keys()}self.G.load_state_dict(new_state_dict)def forward(self, input):output = self.G(input)return output
測試代碼
import os
import sys
import numpy as np
import torch
import argparse
import torch.utils.data
from PIL import Image
from Model import FinetuneModel
from Model import generator
from torchvision.utils import save_imageparser = argparse.ArgumentParser("GAN")
parser.add_argument('--save_path', type=str, default='./test_result')
parser.add_argument('--gpu', type=int, default=0)
parser.add_argument('--seed', type=int, default=2)
parser.add_argument('--model', type=str, default='generator.pth')args = parser.parse_args()
save_path = args.save_path
os.makedirs(save_path, exist_ok=True)def to_img(x): # 將結果的-0.5~0.5變為0~1保存圖片out = 0.5 * (x + 1)out = out.clamp(0, 1)out = out.view(-1, 1, 28, 28)return outdef main():if not torch.cuda.is_available():print("no gpu device available")sys.exit(1)model = FinetuneModel(args.model)model = model.to(device=args.gpu)model.eval()z_dimension = 100with torch.no_grad():for i in range(100):z = torch.randn(96, z_dimension).cuda() # 創建一個100維度的隨機噪聲作為生成器的輸入 [96,100]output = model(z)print(output.shape)u_name = f'the_{i}.png'print(f'processing {u_name}')u_path = save_path + '/' + u_nameoutput = to_img(output.cpu().detach())save_image(output, u_path)if __name__ == '__main__':main()
本文畢