Detection of Visual Events in Underwater Video Using a Neuromorphic Saliency-based Attention System
Abstract
The Monterey Bay Aquarium Research Institute (MBARI) uses high-resolution video equipment on remotely operated vehicles (ROV) to obtain quantitative data on the distribution and abundance of oceanic animals. High-quality video data supplants the traditional approach of assessing the kinds and numbers of animals in the oceanic water column through towing collection nets behind ships. Tow nets are limited in spatial resolution, and often destroy abundant gelatinous animals resulting in species undersampling. Video camera-based quantitative video transects (QVT) are taken through the ocean midwater, from 50m to 4000m, and provide high-resolution data at the scale of the individual animals and their natural aggregation patterns. However, the current manual method of analyzing QVT video by trained scientists is labor intensive and poses a serious limitation to the amount of information that can be analyzed from ROV dives. Presented here is an automated system for detecting marine animals (events) visible in the videos. Automated detection is difficult due to the low contrast of many translucent animals and due to debris ("marine snow") cluttering the scene. Video frames are processed with an artificial intelligence attention selection algorithm that has proven a robust means of target detection in a variety of natural terrestrial scenes. The candidate locations identified by the attention selection module are tracked across video frames using linear Kalman filters. Typically, the occurrence of visible animals in the video footage is sparse in space and time. A notion of "boring" video frames is developed by detecting whether or not there is an interesting candidate object for an animal present in a particular sequence of underwater video -- video frames that do not contain any "interesting" events. If objects can be tracked successfully over several frames, they are stored as potentially "interesting" events. Based on low-level properties, interesting events are identified and marked in the video frames. Presented here is performance data that compare the automated detection method with that of human annotators. The system enhances the productivity of human video annotators and/or cues a subsequent object classification module by omitting "boring" frames and marking candidate objects.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2003
- Bibcode:
- 2003AGUFM.H11F0912E
- Keywords:
-
- 1899 General or miscellaneous;
- 4815 Ecosystems;
- structure and dynamics;
- 4894 Instruments and techniques;
- 4899 General or miscellaneous