Spatio-Temporal Action Graph Networks

doi:10.48550/arXiv.1812.01233

Spatio-Temporal Action Graph Networks

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance. Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with global descriptors. We propose a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance. We employ a novel factored embedding of the graph structure, disentangling a representation hierarchy formed over spatial dimensions from that found over temporal variation. We demonstrate the effectiveness of our model on the Charades activity recognition benchmark, as well as a new dataset of driving activities focusing on multi-object interactions with near-collision events. Our model offers significantly improved performance compared to baseline approaches without object-graph representations, or with previous graph-based models.

Publication:

arXiv e-prints

Pub Date:

December 2018

DOI:

10.48550/arXiv.1812.01233

arXiv:

arXiv:1812.01233

Bibcode:

2018arXiv181201233H

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019

NASA/ADS

Spatio-Temporal Action Graph Networks

Abstract