LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge

doi:10.48550/arXiv.2501.01197

LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge

Layers have become indispensable tools for professional artists, allowing them to build a hierarchical structure that enables independent control over individual visual elements. In this paper, we propose LayeringDiff, a novel pipeline for the synthesis of layered images, which begins by generating a composite image using an off-the-shelf image generative model, followed by disassembling the image into its constituent foreground and background layers. By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training to develop generative capabilities for individual layers. Furthermore, by utilizing a pretrained off-the-shelf generative model, our method can produce diverse contents and object scales in synthesized layers. For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers. We also propose high-frequency alignment modules to refine the fine-details of the estimated layers. Our comprehensive experiments demonstrate that our approach effectively synthesizes layered images and supports various practical applications.

Publication:

arXiv e-prints

Pub Date:

January 2025

DOI:

10.48550/arXiv.2501.01197

arXiv:

arXiv:2501.01197

Bibcode:

2025arXiv250101197K

Keywords:

Computer Science - Computer Vision and Pattern Recognition

ADS

LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge

Abstract