Egocentric Hand-object Interaction Detection
Abstract
In this paper, we propose a method to jointly determine the status of hand-object interaction. This is crucial for egocentric human activity understanding and interaction. From a computer vision perspective, we believe that determining whether a hand is interacting with an object depends on whether there is an interactive hand pose and whether the hand is touching the object. Thus, we extract the hand pose, hand-object masks to jointly determine the interaction status. In order to solve the problem of hand pose estimation due to in-hand object occlusion, we use a multi-cam system to capture hand pose data from multiple perspectives. We evaluate and compare our method with the most recent work from Shan et al. \cite{Shan20} on selected images from EPIC-KITCHENS \cite{damen2018scaling} dataset and achieve $89\%$ accuracy on HOI (hand-object interaction) detection which is comparative to Shan's ($92\%$). However, for real-time performance, our method can run over $\textbf{30}$ FPS which is much more efficient than Shan's ($\textbf{1}\sim\textbf{2}$ FPS). A demo can be found from https://www.youtube.com/watch?v=XVj3zBuynmQ
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2022
- DOI:
- 10.48550/arXiv.2211.09067
- arXiv:
- arXiv:2211.09067
- Bibcode:
- 2022arXiv221109067L
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Human-Computer Interaction
- E-Print:
- arXiv admin note: substantial text overlap with arXiv:2109.14734