Class-specific DDPM

In **Denoising Diffusion Probabilistic Models (DDPMs)**, conditioning the input with class-specific information is a common practice to guide the model in generating samples from a specific category. The most widely used techniques for class conditioning include:

### **1. Class Embedding via Concatenation**
   - The class label is first converted into an embedding vector (e.g., using an **embedding layer**).
   - This class embedding is then concatenated with the noise input **\( x_t \)** before being passed into the model.
   - Example:
     \[
     \tilde{x}_t = \text{concat}(x_t, \text{class\_embedding}(y))
     \]
   - **Used in**: Some early conditional DDPMs.

### **2. Class Conditioning via Cross-Attention**
   - The class information is incorporated using **cross-attention** layers, where the class embedding acts as a query and influences how the model processes the noisy input.
   - This is often seen in transformer-based architectures such as **Stable Diffusion**.
   - **Used in**: **Stable Diffusion, Imagen**.

### **3. Class Conditioning via Adaptive Layer Modulation (FiLM)**
   - The class embedding is used to **modulate** the feature maps inside the neural network.
   - A **Feature-wise Linear Modulation (FiLM)** layer applies:
     \[
     \gamma(y) \cdot h + \beta(y)
     \]
     where \( \gamma(y) \) and \( \beta(y) \) are learned functions of the class label.
   - **Used in**: **Classifier-Free Guidance, Conditional DDPMs**.

### **4. Class Conditioning via Conditional Batch Normalization (CBN)**
   - Instead of regular batch normalization, the model applies **class-conditioned batch norm**, where the normalization statistics (\(\gamma\) and \(\beta\)) are learned based on the class.
   - **Used in**: **Some GAN-based diffusion models**.

### **5. Classifier-Free Guidance (CFG)**
   - One of the most effective conditioning methods in DDPM.
   - The model is trained with both **class-conditional** and **unconditional** objectives.
   - During inference, a weighted interpolation between conditional and unconditional noise estimates enhances generation:
     \[
     \hat{\epsilon}_\theta(x_t, y) = w \cdot \epsilon_\theta(x_t, y) + (1-w) \cdot \epsilon_\theta(x_t)
     \]
   - **Used in**: **Guided Diffusion (OpenAI), Stable Diffusion**.

### **Conclusion**
The most common approaches for class conditioning in DDPM include:
1. **Concatenation** of class embeddings with the input.
2. **Cross-attention** using transformer-based conditioning.
3. **Feature-wise modulation (FiLM)** for adaptive feature scaling.
4. **Conditional Batch Norm (CBN)** for normalization-based conditioning.
5. **Classifier-Free Guidance (CFG)**, the most widely used method in modern diffusion models.

**Which one is best?**
- **Concatenation** is simple but less effective for complex models.
- **Cross-attention & FiLM** work well for large models like Stable Diffusion.
- **Classifier-Free Guidance** is **widely used in practice** because it allows both conditional and unconditional sampling.

저작자표시 비영리 변경금지

'Deep Learning' 카테고리의 다른 글

Noise estimation idea 2 (0)	2025.03.18
Improving DDPD noise estimation (0)	2025.03.18
[Pytorch] Inputting specified model parameters to optimizer (0)	2025.01.14
How to count the number of parameters of a NN and measure FLOPs required for the NN (0)	2024.12.18
Denoising Diffusion Probabilistic model (0)	2024.11.25

즐거운 인생 - Happy Life

Class-specific DDPM

'Deep Learning' 카테고리의 다른 글

티스토리툴바

Class-specific DDPM

'Deep Learning' 카테고리의 다른 글

'Deep Learning' Related Articles

티스토리툴바