Temporal Action Localization Using Gated Recurrent Units

doi:10.48550/arXiv.2108.03375

Temporal Action Localization Using Gated Recurrent Units

Temporal Action Localization (TAL) task which is to predict the start and end of each action in a video along with the class label of the action has numerous applications in the real world. But due to the complexity of this task, acceptable accuracy rates have not been achieved yet, whereas this is not the case regarding the action recognition task. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing methods for TAL task. Specifically, we propose a new design for the output layer of the conventionally GRU resulting in the so-called GRU-Split network. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 and ActivityNet-1.3 datasets. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Specifically in the mean Average Precision (mAP) metric at Intersection over Union (IoU) of 0.7 on Thumos14, we get 27.52% accuracy which is 5.12% better than that of state-of-the-art methods.

Publication:

arXiv e-prints

Pub Date:

August 2021

DOI:

10.48550/arXiv.2108.03375

arXiv:

arXiv:2108.03375

Bibcode:

2021arXiv210803375K

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Multimedia

E-Print:

10 pages, 6 figures

NASA/ADS

Temporal Action Localization Using Gated Recurrent Units

Abstract