SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification
Abstract
This paper introduces the first public large-scale, long-span dataset with sea turtle photographs captured in the wild -- SeaTurtleID2022 (https://www.kaggle.com/datasets/wildlifedatasets/seaturtleid2022). The dataset contains 8729 photographs of 438 unique individuals collected within 13 years, making it the longest-spanned dataset for animal re-identification. All photographs include various annotations, e.g., identity, encounter timestamp, and body parts segmentation masks. Instead of standard "random" splits, the dataset allows for two realistic and ecologically motivated splits: (i) a time-aware closed-set with training, validation, and test data from different days/years, and (ii) a time-aware open-set with new unknown individuals in test and validation sets. We show that time-aware splits are essential for benchmarking re-identification methods, as random splits lead to performance overestimation. Furthermore, a baseline instance segmentation and re-identification performance over various body parts is provided. Finally, an end-to-end system for sea turtle re-identification is proposed and evaluated. The proposed system based on Hybrid Task Cascade for head instance segmentation and ArcFace-trained feature-extractor achieved an accuracy of 86.8%.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2023
- DOI:
- 10.48550/arXiv.2311.05524
- arXiv:
- arXiv:2311.05524
- Bibcode:
- 2023arXiv231105524A
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- This version is essentially an updated version of the initial SeaTurtleID paper (arXiv:2211.10307) and from now on it can be found as a replacement of the latter paper. You can also find the published version here: https://openaccess.thecvf.com/content/WACV2024/html/Adam_SeaTurtleID2022_A_Long-Span_Dataset_for_Reliable_Sea_Turtle_Re-Identification_WACV_2024_paper.html