Training machines to improve species identification using GBIF-mediated datasets
Abstract
Machine vision technology now provides real-time identification of images of tens of thousands of species across a wide range of taxonomic groups—witness iNaturalist's suggestion of species IDs to users who create observation records.
As an international network and research infrastructure that provides free and open access to biodiversity data, GBIF—the Global Biodiversity Information Facility—can help advance this technology. Having assembling the world's largest index of biodiversity data, it has also amassed one of the largest available datasets of labelled species images, with more than 43 million records associated with one or more images. The GBIF network has implemented practices with requirements more cultural than technical: the adoption of open licences, guidance on data citation, and the development of a DOI-based system for tracking reuse of data. Applying lessons learned alongside a team of experts, GBIF is assisting research aimed at increasing machine vision's power and availability. Training datasets are critical to achieving species recognition capability in any such system. These datasets compile representative images containing explicit, verifiable identifications of the species they include. High-powered computers run algorithms to analyse the imagery, building complex models that characterize defining features for each species or taxonomic group. Researchers can then apply the models to new images, determining what species or group they likely contain. GBIF and its partners are exploring the use of location and date information to further improve model results identification methods for fine-scale attributes, characters, traits, or partial IDs, with an eye toward human interpretability expertise modeling for improved determination of 'research grade' images and metadata Machine vision models integrated into data collection tools can improve user experience and help novices contribute verified occurrence records. To assist in developing and refining machine vision models, GBIF is providing training datasets, promoting proper license and citation practice, and linking citations of training datasets to contributing ones—ensuring that data is used responsibly and transparently, closing the gap between machine vision scientists, application developers and users.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2019
- Bibcode:
- 2019AGUFMIN53C0758C
- Keywords:
-
- 1906 Computational models;
- algorithms;
- INFORMATICS;
- 1916 Data and information discovery;
- INFORMATICS;
- 1942 Machine learning;
- INFORMATICS;
- 1956 Numerical algorithms;
- INFORMATICS