AccDiffusion

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Xiamen University¹
Skywork AI²
Tencent³

Abstract

This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation. We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training. An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation, while no prompt compromises the image details. Therefore, our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of an image patch. Besides, AccDiffusion also introduces dilated sampling with window interaction for better global consistency in higher-resolution image generation. Experimental comparison with existing methods demonstrates that our AccDiffusion effectively addresses the issue of repeated object generation and leads to better performance in higher-resolution image generation.

An in-depth analysis of small object repetition generation

In this paper, our in-depth analysis of DemoFusion indicates, as illustrated in the above figure(a), small object repetition generation is the adversarial outcome of an identical text prompt on all patches, encouraging to generate repeated objects, and global semantic information from residual connection and dilated sampling, suppressing the generation of repeated objects.

Patch-content-aware Prompt

To completely solve small object repetition, as illustrated in the above figure, we propose to decouple the vanilla image-content-aware prompt into a set of patch-content-aware substrings, each of which serves as a more precise prompt to describe the patch contents. Specifically, we utilize the cross-attention map from the low-resolution generation process to determine whether a word token should serve as the prompt for a patch. If a word token has a high response in the cross-attention map region corresponding to the patch, it should be included in the prompt, and vice versa.

Dilated Sampling With Window Interaction

Through visualization, we observe that the dilated sampling operation in DemoFusion generates globally inconsistent and noisy information, disrupting the generation of higher-resolution images. Such inconsistency stems from the independent denoising of dilation samples without interaction. To address this, we employ a position-wise bijection function to enable interaction between the noise from different dilation samples. Experimental results show that our dilated sampling with interaction leads to the generation of smoother global semantic information.

Qualitative comparison of our AccDiffusion with existing training-free image generation extrapolation methods

AccDiffusion can successfully conduct higher-resolution image generation without object repetition.

Citation

If you find this paper useful in your research, please consider citing:

@inproceedings{lin2024accdiffusion,
  title={AccDiffusion : An Accurate Method for Higher-Resolution Image Generation},
  author={Lin, Zhihang and Lin, Mingbao and Meng, Zhao and Ji, Rongrong},
  booktitle={ECCV},
  year={2024}
}

Reference

Podell, Dustin, et al. "Sdxl: improving latent diffusion models for high-resolution image synthesis." arXiv preprint arXiv:2307.01952. 2023.
Zhang, Kai, et al. "Designing a practical degradation model for deep blind image super-resolution." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
Zhang, Lvmin, et al. "Adding conditional control to text-to-image diffusion models." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Du, Ruoyi, et al. "DemoFusion: Democratising High-Resolution Image Generation With No \$\$\$" In Computer Vision and Pattern Recognition. 2024.

Acknowledgements

This work was supported by National Science and Technology Major Project (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No.62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. U23A20383, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No.2022J06001).