CFG++: Manifold-constrained Classifier Free Guidance For Diffusion Models

KAIST
*Indicates Equal Contribution
arXiv Code

CFG++ fixes CFG's issues with lower guidance scales,

improving text-to-image quality and invertibility.

T2I Generation Results

 (SD v1.5, SDXL)

Abstract

Classifier-free guidance(CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG's reliance on high guidance scales presents notable drawbacks. In response, we introduce simple solution to this seemingly inherent limitation: CFG++ . This innovation addresses the off-manifold issue inherent in CFG, thereby enabling effective utilization of small guidance scales (0 < $ \lambda $ < 1) .

CFG 😓 CFG++ 😁
T2I Generation Mode Collapse and Saturation Better Sample Quality
& Adherence to text
DDIM Inversion
w/ CFG(++)
Breakdown Improves and enables better image editing
PF-ODE trajectory Unnatural, Curved Smooth, Straighter
Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance.

Method

rs_algorithm

In CFG++, the renoising process after applying Tweedie’s formula should utilize the unconditional noise $ \hat\epsilon_\varnothing $ instead of $ \hat\epsilon^w_c $. This surprisingly simple fix to the original CFG algorithm leads to smoother trajectory of generation. This improvement is also demonstrated in the following visualization of the discrete evolution of the posterior mean.

tweedie_progression

Experimental Results

1. T2I Generation

t2i_gen_w_interpolation

As demonstrated by the teaser images, our CFG++ method results in a smoother generation trajectory and superior quality. Additionally, we visualize multiple images generated by CFG++ as we increase the guidance scale $ \gamma $. The visualization shows a smooth transition from unconditional sampling towards highly conditional sampling.

T2I_gen_lightning

We find that the improvement gain from CFG++ is even more dramatic for distilled diffusion models such as SDXL-{turbo, lightning}. We see significant boosts in the quality of the generated images, which is also depicted in the improvements seen in the above figure.

2. Inversion and Editing

inv_algorithm inversion

We demonstrate the effect of CFG++ on the image inversion task. (a) Notably, DDIM inversion with CFG++ consistently reconstructs the source image across all guidance scales, whereas DDIM inversion with CFG fails to do so. (b) Quantitative results, including PSNR and RMSE, show a consistent improvement in reconstruction performance.

t2i_gen_w_interpolation

The figures above compare image editing results using CFG and CFG++ followed by image inversion. During the editing stage, a word in the source text is swapped with the target concept, and this modified text is used as the condition for sampling. Our algorithm successfully works for both synthetic and real images.


3. Text-conditioned Inverse Problems w/ PSLD

ldis_main

We show that CFG++ enables the incorporation of text prompts into a standard diffusion inverse solvers. Specifically, we focus on comparing the performance of PSLD combined with CFG and CFG++ in solving linear inverse problems. CFG++ consistently delivers high-quality reconstructions across all tasks.