Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator
Abstract
This paper presents a novel approach for unsupervised video summarization using reinforcement learning. It aims to address the existing limitations of current unsupervised methods, including unstable training of adversarial generator-discriminator architectures and reliance on hand-crafted reward functions for quality evaluation. The proposed method is based on the concept that a concise and informative summary should result in a reconstructed video that closely resembles the original. The summarizer model assigns an importance score to each frame and generates a video summary. In the proposed scheme, reinforcement learning, coupled with a unique reward generation pipeline, is employed to train the summarizer model. The reward generation pipeline trains the summarizer to create summaries that lead to improved reconstructions. It comprises a generator model capable of reconstructing masked frames from a partially masked video, along with a reward mechanism that compares the reconstructed video from the summary against the original. The video generator is trained in a self-supervised manner to reconstruct randomly masked frames, enhancing its ability to generate accurate summaries. This training pipeline results in a summarizer model that better mimics human-generated video summaries compared to methods relying on hand-crafted rewards. The training process consists of two stable and isolated training steps, unlike adversarial architectures. Experimental results demonstrate promising performance, with F-scores of 62.3 and 54.5 on TVSum and SumMe datasets, respectively. Additionally, the inference stage is 300 times faster than our previously reported state-of-the-art method.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- 10.48550/arXiv.2407.04258
- arXiv:
- arXiv:2407.04258
- Bibcode:
- 2024arXiv240704258A
- Keywords:
-
- Computer Science - Multimedia;
- Computer Science - Artificial Intelligence;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning