PetDB Match-Maker: A Machine Learning/Artificial Intelligence Experiment
Abstract
The utility of geochemical data compiled in public databases depends largely on how well the metadata is documented. In the case of PetDB, curators with a high level of sophistication manually process peer-reviewed articles to extract details that ensure samples analyzed by multiple researchers are linked together. However, the task of compiling and curating data for ingestion to PetDB requires significant time and effort to obtain metadata that indicate commonalities between samples despite variations in naming convention. In fact, much of the information used by curators to link samples from the ocean floor already exists in the database. To determine the extent to which metadata curation effort can be reduced we are conducting an analysis of the metadata and machine learning experiments to assess to what extent automated methods can be utilized to successfully predict matches between metadata and unique samples or find siblings from the same expedition based on name, location, and other qualifiers in PetDB. A part of this approach is to build a view of the sample database where each row is indexed by individual author/contributor and curator identified samples, and treating other attributes as features. This allows us to train a classifier to identify pairs of rows with similar features to predict sample matches. Our goal is to reduce processing time and automate procedures that are currently based on curator knowledge and experience and to share the code via open-source platforms so that others can apply the technique to solve similar problems involving samples and their metadata.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2020
- Bibcode:
- 2020AGUFMED0440003A
- Keywords:
-
- 0412 Biogeochemical kinetics and reaction modeling;
- BIOGEOSCIENCES;
- 0430 Computational methods and data processing;
- BIOGEOSCIENCES;
- 4899 General or miscellaneous;
- OCEANOGRAPHY: BIOLOGICAL;
- 4899 General or miscellaneous;
- OCEANOGRAPHY: CHEMICAL