論文閱讀筆記:Denoising Diffusion Implicit Models (5)

0、快速訪問

論文閱讀筆記:Denoising Diffusion Implicit Models (1)
論文閱讀筆記:Denoising Diffusion Implicit Models (2)
論文閱讀筆記:Denoising Diffusion Implicit Models (3)
論文閱讀筆記:Denoising Diffusion Implicit Models (4)
論文閱讀筆記:Denoising Diffusion Implicit Models (5)

5、接上文論文閱讀筆記:Denoising Diffusion Implicit Models (4)

這里使用中的 σ t \sigma_t σt?是可以自己定義的量。有兩種特殊的情況:
1、 σ t 2 = 0 \sigma_t^2=0 σt2?=0此時,
x t ? 1 x_{t-1} xt?1?滿足公式(3)
x t ? 1 = α t ? 1 ? x t ? 1 ? α t ? z t α t + 1 ? α t ? 1 ? σ t 2 ? z t + σ t 2 ? t = α t ? 1 ? x 0 + 1 ? α t ? 1 ? z t \begin{equation} \begin{split} x_{t-1}&=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot x_0+\sqrt{1-\alpha_{t-1}}\cdot z_t \\ \end{split} \end{equation} xt?1??=αt?1? ??αt? ?xt??1?αt? ??zt??+1?αt?1??σt2? ??zt?+σt2??t?=αt?1? ??x0?+1?αt?1? ??zt????
x t ? n x_{t-n} xt?n?滿足
x t ? n = α t ? n ? x t ? 1 ? α t ? z t α t + 1 ? α t ? n ? σ t 2 ? z t + σ t 2 ? t = α t ? n ? x 0 + 1 ? α t ? n ? z t \begin{equation} \begin{split} x_{t-n}&=\sqrt{\alpha_{t-n}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-n}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-n}}\cdot x_0+\sqrt{1-\alpha_{t-n}}\cdot z_t \\ \end{split} \end{equation} xt?n??=αt?n? ??αt? ?xt??1?αt? ??zt??+1?αt?n??σt2? ??zt?+σt2??t?=αt?n? ??x0?+1?αt?n? ??zt????
可以看出,此時, x t ? 1 x_{t-1} xt?1? x t ? n x_{t-n} xt?n?退化成上文論文閱讀筆記:Denoising Diffusion Implicit Models (2)中的Lemma 1.

2、 σ t 2 = 1 ? α t ? 1 1 ? α t ? ( 1 ? α t α t ? 1 ) \sigma_t^2=\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}) σt2?=1?αt?1?αt?1???(1?αt?1?αt??)此時, x t ? 1 x_{t-1} xt?1?滿足公式(4)
x t ? 1 = α t ? 1 ? x t ? 1 ? α t ? z t α t + 1 ? α t ? 1 ? σ t 2 ? z t + σ t 2 ? t = α t ? 1 ? x t ? 1 ? α t ? z t α t + 1 ? α t ? 1 ? 1 ? α t ? 1 1 ? α t ? ( 1 ? α t α t ? 1 ) ? z t + σ t 2 ? t = α t ? 1 ? x t ? 1 ? α t ? z t α t + ( 1 ? α t ? 1 ) ( 1 ? 1 1 ? α t ? α t ? 1 ? α t α t ? 1 ) ? z t + σ t 2 ? t = α t ? 1 ? x t ? 1 ? α t ? z t α t + ( 1 ? α t ? 1 ) α t ? 1 ? α t ? 1 ? α t ? α t ? 1 + α t α t ? 1 ? ( 1 ? α t ) ? z t + σ t 2 ? t = α t ? 1 ? x t ? 1 ? α t ? z t α t + ( 1 ? α t ? 1 ) ? α t ? 1 ? α t + α t α t ? 1 ? ( 1 ? α t ) ? z t + σ t 2 ? t = α t ? 1 ? x t ? 1 ? α t ? z t α t + ( 1 ? α t ? 1 ) α t α t ? 1 ? ( 1 ? α t ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? α t ? 1 1 ? α t ? z t α t + ( 1 ? α t ? 1 ) α t α t ? 1 ? ( 1 ? α t ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? ( α t ? 1 1 ? α t ? α t ? 1 1 ? α t ? ( 1 ? α t ? 1 ) ? α t ? α t α t ? α t ? 1 ? ( 1 ? α t ) ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? ( α t ? 1 1 ? α t ? α t ? 1 1 ? α t ? ( 1 ? α t ? 1 ) ? α t ? α t α t ? α t ? 1 ? ( 1 ? α t ) ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? ( α t ? 1 ? ( 1 ? α t ) ? ( 1 ? α t ? 1 ) ? α t α t ? α t ? 1 ? ( 1 ? α t ) ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? ( α t ? 1 ? α t ? α t ? 1 ? α t + α t ? α t ? 1 α t ? α t ? 1 ? ( 1 ? α t ) ) ? z t = α t ? 1 ? x t α t ? ( α t ? 1 ? α t α t ? α t ? 1 ? ( 1 ? α t ) ) ? z t + σ t 2 ? t = α t ? 1 ? x t α t ? ( α t ? 1 ? ( α t ? 1 ? α t ) α t ? 1 ? α t ? ( 1 ? α t ) ) ? z t + σ t 2 ? t = α t ? 1 α t ( x t ? α t ? 1 ? α t α t ? 1 ? 1 ? α t ) + σ t 2 ? t = α t ? 1 α t ( x t ? 1 1 ? α t ? ( 1 ? α t α t ? 1 ) ) ? z t + σ t 2 ? t = 1 α t ( x t ? β t 1 ? α ˉ t ) ? z t + σ t 2 ? t (換成 D D P M 中的符號) \begin{equation} \begin{split} x_{t-1}&=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}})}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})(1-\frac{1}{1-\alpha_t}\cdot \frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}})}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})\frac{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}-\alpha_{t-1}+\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})\frac{-\alpha_{t-1}\cdot \alpha_{t}+\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+(1-\alpha_{t-1})\sqrt{\frac{\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}}-\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+(1-\alpha_{t-1})\sqrt{\frac{\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}\cdot\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}-(1-\alpha_{t-1})\cdot\sqrt{\alpha_t}\cdot\sqrt{\alpha_t}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t+ \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}\cdot\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}-(1-\alpha_{t-1})\cdot\sqrt{\alpha_t}\cdot\sqrt{\alpha_t}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}}\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}\cdot({1-\alpha_t)}-(1-\alpha_{t-1})\cdot \alpha_t}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}-\bcancel{\alpha_t\cdot \alpha_{t-1}}-\alpha_t+\bcancel{\alpha_t\cdot \alpha_{t-1}}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}-\alpha_t}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}\cdot (\alpha_{t-1}-\alpha_t)}{\alpha_{t-1}\cdot\sqrt{\alpha_t}\cdot \sqrt{(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\frac{\sqrt{\alpha_{t-1}}}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}\cdot\ \sqrt{1-\alpha_t}}\Bigg) + \sigma_t^2 \epsilon_t\\ &=\frac{\sqrt{\alpha_{t-1}}}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{1}{\ \sqrt{1-\alpha_t}}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}})\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\frac{1}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{\beta_t}{\ \sqrt{1-\bar\alpha_t}}\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t(換成DDPM中的符號)\\ \end{split} \end{equation} xt?1??=αt?1? ??αt? ?xt??1?αt? ??zt??+1?αt?1??σt2? ??zt?+σt2??t?=αt?1? ??αt? ?xt??1?αt? ??zt??+1?αt?1??1?αt?1?αt?1???(1?αt?1?αt??) ??zt?+σt2??t?=αt?1? ??αt? ?xt??1?αt? ??zt??+(1?αt?1?)(1?1?αt?1??αt?1?αt?1??αt??) ??zt?+σt2??t?=αt?1? ??αt? ?xt??1?αt? ??zt??+(1?αt?1?)αt?1??(1?αt?)αt?1??αt?1??αt??αt?1?+αt?? ??zt?+σt2??t?=αt?1? ??αt? ?xt??1?αt? ??zt??+(1?αt?1?)αt?1??(1?αt?)?αt?1??αt?+αt?? ??zt?+σt2??t?=αt?1? ??αt? ?xt??1?αt? ??zt??+(1?αt?1?)αt?1??(1?αt?)αt?? ??zt?+σt2??t?=αt?1? ??αt? ?xt???αt? ?αt?1? ?1?αt? ??zt??+(1?αt?1?)αt?1??(1?αt?)αt?? ??zt?+σt2??t?=αt?1? ??αt? ?xt???(αt? ??αt?1??(1?αt?) ?αt?1? ?1?αt? ??αt?1? ?1?αt? ??(1?αt?1?)?αt? ??αt? ??)?zt?+σt2??t?=αt?1? ??αt? ?xt???(αt? ??αt?1??(1?αt?) ?αt?1? ?1?αt? ??αt?1? ?1?αt? ??(1?αt?1?)?αt? ??αt? ??)?zt?+σt2??t?=αt?1? ??αt? ?xt???(αt? ??αt?1??(1?αt?) ?αt?1??(1?αt?)?(1?αt?1?)?αt??)?zt?+σt2??t?=αt?1? ??αt? ?xt???(αt? ??αt?1??(1?αt?) ?αt?1??αt??αt?1? ??αt?+αt??αt?1? ??)?zt?=αt?1? ??αt? ?xt???(αt? ??αt?1??(1?αt?) ?αt?1??αt??)?zt?+σt2??t?=αt?1? ??αt? ?xt???(αt?1??αt? ??(1?αt?) ?αt?1? ??(αt?1??αt?)?)?zt?+σt2??t?=αt? ?αt?1? ??(xt??αt?1???1?αt? ?αt?1??αt??)+σt2??t?=αt? ?αt?1? ??(xt???1?αt? ?1??(1?αt?1?αt??))?zt?+σt2??t?=αt? ?1?(xt???1?αˉt? ?βt??)?zt?+σt2??t?(換成DDPM中的符號)???
可以看出,此時,DDIM退化成了DDPM。
論文討論了 σ t 2 \sigma_t^2 σt2?選取 η ? 1 ? α t ? 1 1 ? α t ? ( 1 ? α t α t ? 1 ) , η ∈ [ 0 , 1 ] \eta\cdot \frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}),\eta\in[0,1] η?1?αt?1?αt?1???(1?αt?1?αt??),η[0,1],即在0和DDPM之間變化時。不同 η \eta η以及跳不同步時所對應的表現,如下圖所示。
請添加圖片描述

6、代碼

class DDIMPipeline(DiffusionPipeline):model_cpu_offload_seq = "unet"def __init__(self, unet, scheduler):super().__init__()# make sure scheduler can always be converted to DDIMscheduler = DDIMScheduler.from_config(scheduler.config)self.register_modules(unet=unet, scheduler=scheduler)@torch.no_grad()def __call__(self,batch_size: int = 1,generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,eta: float = 0.0,num_inference_steps: int = 50,use_clipped_model_output: Optional[bool] = None,output_type: Optional[str] = "pil",return_dict: bool = True,) -> Union[ImagePipelineOutput, Tuple]:# Sample gaussian noise to begin loopif isinstance(self.unet.config.sample_size, int):image_shape = (batch_size,self.unet.config.in_channels,self.unet.config.sample_size,self.unet.config.sample_size,)else:image_shape = (batch_size, self.unet.config.in_channels, *self.unet.config.sample_size)if isinstance(generator, list) and len(generator) != batch_size:raise ValueError(f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"f" size of {batch_size}. Make sure the batch size matches the length of the generators.")# 隨即生成噪音image = randn_tensor(image_shape, generator=generator, device=self._execution_device, dtype=self.unet.dtype)# 設置步數間隔。例如num_inference_steps = 50,然而總步長為1000,那么就是每次跳20步,例如在當前時刻, timestep=980, prev_timestep=960self.scheduler.set_timesteps(num_inference_steps)for t in self.progress_bar(self.scheduler.timesteps):# 1. 預測出timestep=980時刻對應噪音model_output = self.unet(image, t).sample# 2. 調用scheduler的方法step,執行公式()得到prev_timestep=960時刻的圖像image = self.scheduler.step(model_output, t, image, eta=eta, use_clipped_model_output=use_clipped_model_output, generator=generator).prev_sampleimage = (image / 2 + 0.5).clamp(0, 1)image = image.cpu().permute(0, 2, 3, 1).numpy()if output_type == "pil":image = self.numpy_to_pil(image)if not return_dict:return (image,)return ImagePipelineOutput(images=image)class DDIMScheduler(SchedulerMixin, ConfigMixin):_compatibles = [e.name for e in KarrasDiffusionSchedulers]order = 1@register_to_configdef __init__(self,num_train_timesteps: int = 1000,beta_start: float = 0.0001,beta_end: float = 0.02,beta_schedule: str = "linear",trained_betas: Optional[Union[np.ndarray, List[float]]] = None,clip_sample: bool = True,set_alpha_to_one: bool = True,steps_offset: int = 0,prediction_type: str = "epsilon",thresholding: bool = False,dynamic_thresholding_ratio: float = 0.995,clip_sample_range: float = 1.0,sample_max_value: float = 1.0,timestep_spacing: str = "leading",rescale_betas_zero_snr: bool = False,):if trained_betas is not None:self.betas = torch.tensor(trained_betas, dtype=torch.float32)elif beta_schedule == "linear":self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)elif beta_schedule == "scaled_linear":# this schedule is very specific to the latent diffusion model.self.betas = torch.linspace(beta_start**0.5, beta_end**0.5, num_train_timesteps, dtype=torch.float32) ** 2elif beta_schedule == "squaredcos_cap_v2":# Glide cosine scheduleself.betas = betas_for_alpha_bar(num_train_timesteps)else:raise NotImplementedError(f"{beta_schedule} is not implemented for {self.__class__}")# Rescale for zero SNRif rescale_betas_zero_snr:self.betas = rescale_zero_terminal_snr(self.betas)self.alphas = 1.0 - self.betasself.alphas_cumprod = torch.cumprod(self.alphas, dim=0)# At every step in ddim, we are looking into the previous alphas_cumprod# For the final step, there is no previous alphas_cumprod because we are already at 0# `set_alpha_to_one` decides whether we set this parameter simply to one or# whether we use the final alpha of the "non-previous" one.self.final_alpha_cumprod = torch.tensor(1.0) if set_alpha_to_one else self.alphas_cumprod[0]# standard deviation of the initial noise distributionself.init_noise_sigma = 1.0# setable valuesself.num_inference_steps = Noneself.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64))def scale_model_input(self, sample: torch.Tensor, timestep: Optional[int] = None) -> torch.Tensor:"""Ensures interchangeability with schedulers that need to scale the denoising model input depending on thecurrent timestep.Args:sample (`torch.Tensor`):The input sample.timestep (`int`, *optional*):The current timestep in the diffusion chain.Returns:`torch.Tensor`:A scaled input sample."""return sampledef _get_variance(self, timestep, prev_timestep):alpha_prod_t = self.alphas_cumprod[timestep]alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprodbeta_prod_t = 1 - alpha_prod_tbeta_prod_t_prev = 1 - alpha_prod_t_prevvariance = (beta_prod_t_prev / beta_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)return variance# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sampledef _threshold_sample(self, sample: torch.Tensor) -> torch.Tensor:""""Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (theprediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide bys. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventingpixels from saturation at each step. We find that dynamic thresholding results in significantly betterphotorealism as well as better image-text alignment, especially when using very large guidance weights."https://arxiv.org/abs/2205.11487"""dtype = sample.dtypebatch_size, channels, *remaining_dims = sample.shapeif dtype not in (torch.float32, torch.float64):sample = sample.float()  # upcast for quantile calculation, and clamp not implemented for cpu half# Flatten sample for doing quantile calculation along each imagesample = sample.reshape(batch_size, channels * np.prod(remaining_dims))abs_sample = sample.abs()  # "a certain percentile absolute pixel value"s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)s = torch.clamp(s, min=1, max=self.config.sample_max_value)  # When clamped to min=1, equivalent to standard clipping to [-1, 1]s = s.unsqueeze(1)  # (batch_size, 1) because clamp will broadcast along dim=0sample = torch.clamp(sample, -s, s) / s  # "we threshold xt0 to the range [-s, s] and then divide by s"sample = sample.reshape(batch_size, channels, *remaining_dims)sample = sample.to(dtype)return sampledef set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):"""Sets the discrete timesteps used for the diffusion chain (to be run before inference).Args:num_inference_steps (`int`):The number of diffusion steps used when generating samples with a pre-trained model."""if num_inference_steps > self.config.num_train_timesteps:raise ValueError(f"`num_inference_steps`: {num_inference_steps} cannot be larger than `self.config.train_timesteps`:"f" {self.config.num_train_timesteps} as the unet model trained with this scheduler can only handle"f" maximal {self.config.num_train_timesteps} timesteps.")self.num_inference_steps = num_inference_steps# "linspace", "leading", "trailing" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891if self.config.timestep_spacing == "linspace":timesteps = (np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps).round()[::-1].copy().astype(np.int64))elif self.config.timestep_spacing == "leading":step_ratio = self.config.num_train_timesteps // self.num_inference_steps# creates integer timesteps by multiplying by ratio# casting to int to avoid issues when num_inference_step is power of 3timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)timesteps += self.config.steps_offsetelif self.config.timestep_spacing == "trailing":step_ratio = self.config.num_train_timesteps / self.num_inference_steps# creates integer timesteps by multiplying by ratio# casting to int to avoid issues when num_inference_step is power of 3timesteps = np.round(np.arange(self.config.num_train_timesteps, 0, -step_ratio)).astype(np.int64)timesteps -= 1else:raise ValueError(f"{self.config.timestep_spacing} is not supported. Please make sure to choose one of 'leading' or 'trailing'.")self.timesteps = torch.from_numpy(timesteps).to(device)def step(self,model_output: torch.Tensor,timestep: int,sample: torch.Tensor,eta: float = 0.0,use_clipped_model_output: bool = False,generator=None,variance_noise: Optional[torch.Tensor] = None,return_dict: bool = True,) -> Union[DDIMSchedulerOutput, Tuple]:if self.num_inference_steps is None:raise ValueError("Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler")# 1. get previous step value (=t-1);# timestep=980,self.config.num_train_timesteps=1000, self.num_inference_steps=50# prev_timestep = 960,步數的跳躍間隔為20prev_timestep = timestep - self.config.num_train_timesteps // self.num_inference_steps# 2. compute alphas, betasalpha_prod_t = self.alphas_cumprod[timestep]alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprodbeta_prod_t = 1 - alpha_prod_t# 3. compute predicted original sample from predicted noise also called# "predicted x_0" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfif self.config.prediction_type == "epsilon":pred_original_sample = (sample - beta_prod_t ** (0.5) * model_output) / alpha_prod_t ** (0.5)pred_epsilon = model_outputelif self.config.prediction_type == "sample":pred_original_sample = model_outputpred_epsilon = (sample - alpha_prod_t ** (0.5) * pred_original_sample) / beta_prod_t ** (0.5)elif self.config.prediction_type == "v_prediction":pred_original_sample = (alpha_prod_t**0.5) * sample - (beta_prod_t**0.5) * model_outputpred_epsilon = (alpha_prod_t**0.5) * model_output + (beta_prod_t**0.5) * sampleelse:raise ValueError(f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or"" `v_prediction`")# 4. Clip or threshold "predicted x_0"if self.config.thresholding:pred_original_sample = self._threshold_sample(pred_original_sample)elif self.config.clip_sample:pred_original_sample = pred_original_sample.clamp(-self.config.clip_sample_range, self.config.clip_sample_range)# 5. compute variance: "sigma_t(η)" -> see formula (16)# σ_t = sqrt((1 ? α_t?1)/(1 ? α_t)) * sqrt(1 ? α_t/α_t?1)variance = self._get_variance(timestep, prev_timestep)std_dev_t = eta * variance ** (0.5)if use_clipped_model_output:# the pred_epsilon is always re-derived from the clipped x_0 in Glidepred_epsilon = (sample - alpha_prod_t ** (0.5) * pred_original_sample) / beta_prod_t ** (0.5)# 6. compute "direction pointing to x_t" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfpred_sample_direction = (1 - alpha_prod_t_prev - std_dev_t**2) ** (0.5) * pred_epsilon# 7. compute x_t without "random noise" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfprev_sample = alpha_prod_t_prev ** (0.5) * pred_original_sample + pred_sample_directionif eta > 0:if variance_noise is not None and generator is not None:raise ValueError("Cannot pass both generator and variance_noise. Please make sure that either `generator` or"" `variance_noise` stays `None`.")if variance_noise is None:variance_noise = randn_tensor(model_output.shape, generator=generator, device=model_output.device, dtype=model_output.dtype)variance = std_dev_t * variance_noiseprev_sample = prev_sample + varianceif not return_dict:return (prev_sample,pred_original_sample,)return DDIMSchedulerOutput(prev_sample=prev_sample, pred_original_sample=pred_original_sample)# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noisedef add_noise(self,original_samples: torch.Tensor,noise: torch.Tensor,timesteps: torch.IntTensor,) -> torch.Tensor:# Make sure alphas_cumprod and timestep have same device and dtype as original_samples# Move the self.alphas_cumprod to device to avoid redundant CPU to GPU data movement# for the subsequent add_noise callsself.alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device)alphas_cumprod = self.alphas_cumprod.to(dtype=original_samples.dtype)timesteps = timesteps.to(original_samples.device)sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5sqrt_alpha_prod = sqrt_alpha_prod.flatten()while len(sqrt_alpha_prod.shape) < len(original_samples.shape):sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noisereturn noisy_samples# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.get_velocitydef get_velocity(self, sample: torch.Tensor, noise: torch.Tensor, timesteps: torch.IntTensor) -> torch.Tensor:# Make sure alphas_cumprod and timestep have same device and dtype as sampleself.alphas_cumprod = self.alphas_cumprod.to(device=sample.device)alphas_cumprod = self.alphas_cumprod.to(dtype=sample.dtype)timesteps = timesteps.to(sample.device)sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5sqrt_alpha_prod = sqrt_alpha_prod.flatten()while len(sqrt_alpha_prod.shape) < len(sample.shape):sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * samplereturn velocitydef __len__(self):return self.config.num_train_timesteps

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/74744.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/74744.shtml
英文地址,請注明出處:http://en.pswp.cn/web/74744.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

藍橋杯2024年第十五屆省賽真題-R 格式

題目鏈接&#xff1a; 思路&#xff1a; 通過數組模擬d的每一位&#xff0c;逐位進行計算&#xff0c;從而實現對d的精確處理。 代碼&#xff1a; #include<bits/stdc.h> #define int long long using namespace std; const int N 2020;int n; string s; vector<i…

深入探索 Linux Top 命令:15 個實用示例

在 Linux 系統管理中&#xff0c;top 命令是系統性能監控不可或缺的工具。它能夠實時顯示系統的 CPU、內存、進程等資源的使用情況&#xff0c;幫助您快速識別性能瓶頸和異常進程。本文將詳細介紹 15 個實用的 top 命令使用示例&#xff0c;旨在幫助您更高效地進行系統管理與優…

15.1linux設備樹下的platform驅動編寫(知識)_csdn

上一章我們詳細的講解了 Linux 下的驅動分離與分層&#xff0c;以及總線、設備和驅動這樣的驅動框架。基于總線、設備和驅動這樣的驅動框架&#xff0c; Linux 內核提出來 platform 這個虛擬總線&#xff0c;相應的也有 platform 設備和 platform 驅動。 上一章我們講解了傳統的…

Eclipse 視圖(View)

Eclipse 視圖(View) Eclipse 視圖(View)是 Eclipse 界面的重要組成部分,它提供了用戶交互的平臺,使得用戶可以通過圖形界面來編輯、調試、分析代碼等。在本文中,我們將深入探討 Eclipse 視圖的功能、使用方法以及它們在軟件開發中的作用。 1. 視圖的功能 Eclipse 視圖具…

Python解決“數字插入”問題

Python解決“數字插入”問題 問題描述測試樣例解題思路代碼 問題描述 小U手中有兩個數字 a 和 b。第一個數字是一個任意的正整數&#xff0c;而第二個數字是一個非負整數。她的任務是將第二個數字 b 插入到第一個數字 a 的某個位置&#xff0c;以形成一個最大的可能數字。 你…

ubuntu部署ollama+deepseek+open-webui

ubuntu部署ollamadeepseekopen-webui 全文-ubuntu部署ollamadeepseekopen-webui 大綱 Ollama部署 安裝Ollama&#xff1a;使用命令apt install curl和curl -fsSL https://ollama.com/install.sh | sh ollama-v網絡訪問配置&#xff1a;設置環境變量OLLAMA_HOST0.0.0.0:11434&…

Java的Selenium常用的元素操作API

click 觸發當前元素的點擊事件 clear() 清空內容 sendKeys(...) 往文本框一類元素中寫入內容 getTagName() 獲取元素的的標簽名 getAttribute(屬性名) 根據屬性名獲取元素屬性值 getText() 獲取當前元素的文本值 isDisplayed() 查看元素是否顯示 get(String url) 訪…

洛谷題單3-P1035 [NOIP 2002 普及組] 級數求和-python-流程圖重構

題目描述 已知&#xff1a; S n 1 1 2 1 3 … 1 n S_n 1\dfrac{1}{2}\dfrac{1}{3}…\dfrac{1}{n} Sn?121?31?…n1?。顯然對于任意一個整數 k k k&#xff0c;當 n n n 足夠大的時候&#xff0c; S n > k S_n>k Sn?>k。 現給出一個整數 k k k&#xff0…

CMDB平臺(進階篇):3D機房大屏全景解析

在數字化轉型的浪潮中&#xff0c;數據中心作為企業信息架構的核心&#xff0c;其高效、智能的管理成為企業競爭力的關鍵因素之一&#xff0c;其運維管理方式也正經歷著革命性的變革。傳統基于二維平面圖表的機房監控方式已難以滿足現代企業對運維可視化、智能化的需求。樂維CM…

小白速通:Verilog流水線實現及時序分析

目錄 題目&#xff1a;時序分析&#xff1a;時鐘頻率為50MHz數據1: a10, b20, c30, d40, e2數據2: a5, b15, c25, d35, e3數據3: a8, b12, c16, d24, e4 流水線效率分析 題目&#xff1a; verilog中&#xff0c;y(abcd)*e&#xff0c;時鐘頻率為50Mhz&#xff0c;用流水線的形式…

【RK3588 嵌入式圖形編程】-SDL2-掃雷游戲-創建網格

創建網格 文章目錄 創建網格1、概述2、更新Globals.h文件3、創建單元4、創建網格5、傳遞事件6、清空單元7、反饋單元格已清除8、測試9、完整代碼10、總結在本文中,將詳細介紹如何構建一個二維的交互式掃雷單元格網格。 1、概述 在本文中,我們將專注于構建掃雷游戲的基礎結構…

高精度矢量內積計算方法 (單精度浮點, 超長矢量)

高精度矢量內積計算方法 (單精度浮點, 超長矢量) 對于單精度浮點類型的超長矢量(超過1億元素)內積計算&#xff0c;累加誤差確實是一個重要問題。以下是幾種減少誤差的方法&#xff1a; 1. Kahan求和算法 這是最常用的補償求和算法&#xff0c;可以有效減少累加誤差&#xf…

Java基礎:Logback日志框架

什么是日志 日志技術 可以將系統執行信息&#xff0c;方便的記錄到指定位置&#xff08;控制臺&#xff0c;文件中&#xff0c;數據庫中&#xff09; 可以隨時可以開關的形式控制日志的啟停&#xff0c;無需侵入到源代碼中去進行修改 LogBack日志框架 LogBack快速入門 logb…

MessageQueue --- RabbitMQ WorkQueue and Prefetch

MessageQueue --- RabbitMQ WorkQueue and Prefetch 什么是WorkQueue分發機制 --- RoundRobin分發機制 --- PrefetchSpring example use prefetch --- Fair Dispatch 什么是WorkQueue Work queues&#xff0c;任務模型。簡單來說就是讓多個消費者綁定到一個隊列&#xff0c;共同…

RNN模型與NLP應用——(9/9)Self-Attention(自注意力機制)

聲明&#xff1a; 本文基于嗶站博主【Shusenwang】的視頻課程【RNN模型及NLP應用】&#xff0c;結合自身的理解所作&#xff0c;旨在幫助大家了解學習NLP自然語言處理基礎知識。配合著視頻課程學習效果更佳。 材料來源&#xff1a;【Shusenwang】的視頻課程【RNN模型及NLP應用…

詳解AI采集框架Crawl4AI,打造智能網絡爬蟲

大家好&#xff0c;Crawl4AI作為開源Python庫&#xff0c;專門用來簡化網頁爬取和數據提取的工作。它不僅功能強大、靈活&#xff0c;而且全異步的設計讓處理速度更快&#xff0c;穩定性更好。無論是構建AI項目還是提升語言模型的性能&#xff0c;Crawl4AI都能幫您簡化工作流程…

從零開始玩python--python版植物大戰僵尸來襲

大家好呀&#xff0c;小伙伴們&#xff01;今天要給大家介紹一個超有趣的Python項目 - 用pygame制作植物大戰僵尸游戲的進階版本。相信不少小伙伴都玩過這款經典游戲&#xff0c;今天我們就用Python來實現它&#xff0c;讓編程學習變得更加有趣&#xff01;&#x1f31f; 一、…

圖解AUTOSAR_SWS_FlashTest

AUTOSAR Flash Test模塊詳解 基于AUTOSAR 4.4.0規范的Flash測試模塊分析與圖解 目錄 概述 1.1 Flash Test模塊的作用 1.2 工作原理架構設計 2.1 整體架構 2.2 依賴關系狀態管理 3.1 狀態轉換圖 3.2 前臺與后臺測試模式配置結構 4.1 配置類圖 4.2 關鍵配置參數交互流程 5.1 序列…

【mongodb】mongodb的字段類型

目錄 1. 基本數據類型1.1 String1.2 Number1.3 Boolean1.4 Date1.5 Null1.6 ObjectId1.7 Array1.8 Binary Data1.9 Object 2. 特殊數據類型2.1 Regular Expression2.2 JavaScript2.3 Symbol2.4 Decimal1282.5 Timestamp2.6 MinKey/MaxKey2.7 DBPointer 3. 常用字段類型示例4. 注…

MySQL篇(五)MySQL主從同步原理深度剖析

MySQL篇&#xff08;五&#xff09;MySQL主從同步原理深度剖析 MySQL篇&#xff08;五&#xff09;MySQL主從同步原理深度剖析一、引言二、MySQL主從同步基礎概念主庫&#xff08;Master&#xff09;從庫&#xff08;Slave&#xff09;二進制日志&#xff08;Binary Log&#x…