CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
Abstract
The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce \textbf{CaPa}, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.09433
- Bibcode:
- 2025arXiv250109433H
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Graphics
- E-Print:
- project page: https://ncsoft.github.io/CaPa/