Temporal Action Localization Using Gated Recurrent Units
Abstract
Temporal Action Localization (TAL) task which is to predict the start and end of each action in a video along with the class label of the action has numerous applications in the real world. But due to the complexity of this task, acceptable accuracy rates have not been achieved yet, whereas this is not the case regarding the action recognition task. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing methods for TAL task. Specifically, we propose a new design for the output layer of the conventionally GRU resulting in the so-called GRU-Split network. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 and ActivityNet-1.3 datasets. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Specifically in the mean Average Precision (mAP) metric at Intersection over Union (IoU) of 0.7 on Thumos14, we get 27.52% accuracy which is 5.12% better than that of state-of-the-art methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2021
- DOI:
- 10.48550/arXiv.2108.03375
- arXiv:
- arXiv:2108.03375
- Bibcode:
- 2021arXiv210803375K
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Multimedia
- E-Print:
- 10 pages, 6 figures