Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering

doi:10.48550/arXiv.2210.04514

Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering

Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision with many applications including motion capture, virtual reality, surveillance or gait analysis for sports and medicine. We present preliminary results for a method to estimate 3D pose from 2D video containing a single person and a static background without the need for any manual landmark annotations. We achieve this by formulating a simple yet effective self-supervision task: our model is required to reconstruct a random frame of a video given a frame from another timepoint and a rendered image of a transformed human shape template. Crucially for optimisation, our ray casting based rendering pipeline is fully differentiable, enabling end to end training solely based on the reconstruction task.

Publication:

arXiv e-prints

Pub Date:

October 2022

DOI:

10.48550/arXiv.2210.04514

arXiv:

arXiv:2210.04514

Bibcode:

2022arXiv221004514S

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence;
Computer Science - Graphics

E-Print:

CV4Metaverse Workshop @ ECCV 2022

NASA/ADS

Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering

Abstract