In **Denoising Diffusion Probabilistic Models (DDPMs)**, conditioning the input with class-specific information is a common practice to guide the model in generating samples from a specific category. The most widely used techniques for class conditioning include:
### **1. Class Embedding via Concatenation**
- The class label is first converted into an embedding vector (e.g., using an **embedding layer**).
- This class embedding is then concatenated with the noise input **\( x_t \)** before being passed into the model.
- Example:
\[
\tilde{x}_t = \text{concat}(x_t, \text{class\_embedding}(y))
\]
- **Used in**: Some early conditional DDPMs.
### **2. Class Conditioning via Cross-Attention**
- The class information is incorporated using **cross-attention** layers, where the class embedding acts as a query and influences how the model processes the noisy input.
- This is often seen in transformer-based architectures such as **Stable Diffusion**.
- **Used in**: **Stable Diffusion, Imagen**.
### **3. Class Conditioning via Adaptive Layer Modulation (FiLM)**
- The class embedding is used to **modulate** the feature maps inside the neural network.
- A **Feature-wise Linear Modulation (FiLM)** layer applies:
\[
\gamma(y) \cdot h + \beta(y)
\]
where \( \gamma(y) \) and \( \beta(y) \) are learned functions of the class label.
- **Used in**: **Classifier-Free Guidance, Conditional DDPMs**.
### **4. Class Conditioning via Conditional Batch Normalization (CBN)**
- Instead of regular batch normalization, the model applies **class-conditioned batch norm**, where the normalization statistics (\(\gamma\) and \(\beta\)) are learned based on the class.
- **Used in**: **Some GAN-based diffusion models**.
### **5. Classifier-Free Guidance (CFG)**
- One of the most effective conditioning methods in DDPM.
- The model is trained with both **class-conditional** and **unconditional** objectives.
- During inference, a weighted interpolation between conditional and unconditional noise estimates enhances generation:
\[
\hat{\epsilon}_\theta(x_t, y) = w \cdot \epsilon_\theta(x_t, y) + (1-w) \cdot \epsilon_\theta(x_t)
\]
- **Used in**: **Guided Diffusion (OpenAI), Stable Diffusion**.
### **Conclusion**
The most common approaches for class conditioning in DDPM include:
1. **Concatenation** of class embeddings with the input.
2. **Cross-attention** using transformer-based conditioning.
3. **Feature-wise modulation (FiLM)** for adaptive feature scaling.
4. **Conditional Batch Norm (CBN)** for normalization-based conditioning.
5. **Classifier-Free Guidance (CFG)**, the most widely used method in modern diffusion models.
**Which one is best?**
- **Concatenation** is simple but less effective for complex models.
- **Cross-attention & FiLM** work well for large models like Stable Diffusion.
- **Classifier-Free Guidance** is **widely used in practice** because it allows both conditional and unconditional sampling.
'Deep Learning' 카테고리의 다른 글
Noise estimation idea 2 (0) | 2025.03.18 |
---|---|
Improving DDPD noise estimation (0) | 2025.03.18 |
[Pytorch] Inputting specified model parameters to optimizer (0) | 2025.01.14 |
How to count the number of parameters of a NN and measure FLOPs required for the NN (0) | 2024.12.18 |
Denoising Diffusion Probabilistic model (0) | 2024.11.25 |