HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion

doi:10.48550/arXiv.2501.15008

HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion

We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods. Our code will be made publicly available.

Publication:

arXiv e-prints

Pub Date:

January 2025

DOI:

10.48550/arXiv.2501.15008

arXiv:

arXiv:2501.15008

Bibcode:

2025arXiv250115008T

Keywords:

Computer Science - Computer Vision and Pattern Recognition

ADS

HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion

Abstract