NCSN
p σ ( x ~ ∣ x ) : = N ( x ~ ; x , σ 2 I ) p_\sigma(\tilde{\mathrm{x}}|\mathrm{x}) := \mathcal{N}(\tilde{\mathrm{x}}; \mathrm{x}, \sigma^2\mathbf{I}) pσ?(x~∣x):=N(x~;x,σ2I)
p σ ( x ~ ) : = ∫ p d a t a ( x ) p σ ( x ~ ∣ x ) d x p_\sigma(\mathrm{\tilde{x}}) := \int p_{data}(\mathrm{x})p_\sigma(\mathrm{\tilde{x}}|\mathrm{x})d\mathrm{x} pσ?(x~):=∫pdata?(x)pσ?(x~∣x)dx
p d a t a ( x ) p_{data}(x) pdata?(x)表示目標數據分布。 σ m i n = σ 1 < σ 2 < ? ? ? < σ N = σ m a x \sigma_{\mathrm{min}}=\sigma_1<\sigma_2<\cdot\cdot\cdot<\sigma_N=\sigma_{\mathrm{max}} σmin?=σ1?<σ2?<???<σN?=σmax?
σ m i n \sigma_{\mathrm{min}} σmin?足夠小,以至于 p σ m i n ( x ) ≈ p d a t a ( x ) ) p_{\sigma_{\mathrm{min}}}(\mathrm{x}) \approx p_{data}(\mathrm{x})) pσmin??(x)≈pdata?(x)), σ m a x \sigma_{\mathrm{max}} σmax?足夠大,以至于 p σ m i n ( x ) ) ≈ N ( x ; 0 , σ m a x 2 I ) p_{\sigma_{\mathrm{min}}}(\mathrm{x})) \approx \mathcal{N}(\mathbf{x}; \mathbf{0}, \sigma^2_{\mathrm{max}}\mathbf{I}) pσmin??(x))≈N(x;0,σmax2?I)
θ ? = arg?min ? θ ∑ i = 1 N σ i 2 E x ~ p d a t a ( x ) E x ~ ~ p σ i ( x ~ ∣ x ) [ ∣ ∣ s θ ( x ~ , σ i ) ? ▽ x ~ l o g p σ ~ ( x ~ ∣ x ) ∣ ∣ 2 2 ] \theta^{*} = \argmin_\theta \sum_{i=1}^N \sigma_i^2 \mathbb{E}_{\mathrm{x}\sim p_{data}(\mathrm{x})}\mathbb{E}_{\tilde{\mathrm{x}}\sim p_{\sigma_i}(\tilde{\mathrm{x}}|\mathrm{x})} \Big[ ||s_\theta(\tilde{\mathrm{x}}, \sigma_i) - \mathbf{\triangledown}_{\tilde{\mathrm{x}}}\mathrm{log}p_{\tilde{\sigma}}(\tilde{\mathrm{x}}|\mathrm{x})||^2_2\Big] θ?=θargmin?i=1∑N?σi2?Ex~pdata?(x)?Ex~~pσi??(x~∣x)?[∣∣sθ?(x~,σi?)?▽x~?logpσ~?(x~∣x)∣∣22?]
模型訓練完畢后,執行M步Langevin MCMC采樣:
x i m = x i m ? 1 + ? i s θ ? ( x i m ? 1 , σ i ) + 2 ? i z i m , m = 1 , 2 , ? ? ? , M x_i^m = x_i^{m-1} + \epsilon_i s_{\theta^{*}}(x_i^{m-1}, \sigma_i) + \sqrt{2\epsilon_i}z_i^m, \quad m=1,2,\cdot\cdot\cdot, M xim?=xim?1?+?i?sθ??(xim?1?,σi?)+2?i??zim?,m=1,2,???,M
? i > 0 \epsilon_i>0 ?i?>0為步長, z i m z_i^m zim?是標準正態分布。上述采樣過程重復 i = N , N ? 1 , ? ? ? , 1 i=N, N-1, \cdot\cdot\cdot, 1 i=N,N?1,???,1,也就是說對于每個noise level,執行N步,直至樣本收斂至當前noise level的最佳位置。
x N 0 ~ N ( x ∣ 0 , σ m a x 2 I ) , x i 0 = x i + 1 M w h e n i < N x_N^0 \sim \mathcal{N}(\mathrm{x}|0, \sigma^2_{max}\mathbf{I}), \ x_i^0 = x_{i+1}^M \mathrm{when}\ i < N xN0?~N(x∣0,σmax2?I),?xi0?=xi+1M?when?i<N
樣例代碼如下:
import torch
def langevin_sampling(score_network, noise_levels, num_steps, step_size, batch_size, device):# 初始化樣本,從標準正態分布中采樣x = torch.randn(batch_size, 3, 32, 32).to(device)# 多級噪聲采樣,從高噪聲到低噪聲逐步采樣,最終得到逼近目標分布的樣本for sigma in noise_levels:print(f"Sampling at noise level: {sigma}")# Langevin動力學迭代for _ in range(num_steps):# 計算分布函數梯度(由分數網絡預測)with torch.no_grad():grad = score_network(x, sigma) # 輸入當前樣本和噪聲水平# 梯度上升步x = x + step_size * grad# 添加隨機噪聲步noise = torch.randn_like(x) * (2 * step_size) ** 0.5x = x + noisereturn xclass DummyScoreNet(nn.Module):def forward(self, x, sigma):return -x / (sigma ** 2)device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
score_network = DummyScoreNet().to(device)noise_levels = [50.0, 25.0, 10.0, 5.0, 1.0]
num_steps = 50
step_size = 0.1
batch_size = 64samples = langevin_sampling(score_network,noise_levels,num_steps,step_size,batch_size,device)