Populating a Graph Database to Run a Usage-Based Discovery Tool
Abstract
Most dataset discovery tools for Earth Observation data rely on descriptions and other metadata of the datasets, using keyword searches or attribute filtering to determine relevance. However, these descriptions often do not include the potential uses of the data. Thus, a user working on floods will rarely see few if any rainfall datasets show up in such a search. The Usage Based Discovery tool, on the other hand, offers usage instances to the user, either research articles or applications, along with the datasets that those usage instances used. This allows a user, particularly one new to the world of Earth Observation data, to investigate which datasets are used in similar cases. The information that powers Usage-Based Discovery is a graph database of relationships of usage to dataset and usage to topic, allowing the user to narrow their search for similar cases. In order to scale out to a graph database rich enough to provide a satisfactory user experience, we combine manual and automated processes to populate the graph. The initial content of the graph has been seeded primarily via human-aided data curation methods, using sites like Google Scholar. To scale up this effort, weve employed crowdsourcing. It is easy for anyone to contribute to our graph using their Open Researcher and Contributor Identifier for authorization. Were now experimenting with Machine Learning and Natural Language Processing to help automate population of the graph, starting with the classification of research articles by topic. Finding adequate training data in the absence of a comprehensive and open research article API continues to be a significant challenge.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFMIN45H0518I