HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

doi:10.48550/arXiv.2403.12011

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more controllable and realistic synthesis as we can specify the structure and style inputs in a disentangled manner. HOIDiffusion is trained by leveraging a diffusion model pre-trained on large-scale natural images and a few 3D human demonstrations. Beyond controllable image synthesis, we adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems. Project page: https://mq-zhang1.github.io/HOIDiffusion

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.12011

arXiv:

arXiv:2403.12011

Bibcode:

2024arXiv240312011Z

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

Project page: https://mq-zhang1.github.io/HOIDiffusion

NASA/ADS

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Abstract