Improve Earth Data Discovery through Deep Query Understanding
Abstract
Discovering Earth science data has been challenging given both the increased quantity and decreased latency of geographic data and the heterogeneity of the data across a wide variety of domains. One longstanding problem in Earth data discovery is understanding a user's search intent from the input query. There are a few existing libraries and APIs available for spatial and temporal parsing, e.g. CLAVIN for spatial parsing and SUTime for temporal parsing. Currently, some earth data search portals parse spatial and temporal components from user queries based on these open source APIs. However, to our knowledge no existing geoinformatics work has tried to parse and tag the non-spatial and temporal components of the query syntax, which usually consists of entities like geophysical variable, satellite name, instrument name, processing level, etc. Understanding the desired objectives behind user queries is difficult because (1) user queries are usually not in full sentences, (2) users tend to use many acronyms in-lieu of full-names, (3) a lack of semantic context exists. Recent progress in deep learning and natural language processing (NLP) algorithms has achieved great performance in query understanding. To fill this gap, we therefore proposed to develop a query understanding tool to better interpret users' search intents for Earth data search engines by mining metadata and user query logs. The query understanding tool has four components: spatial and temporal parsing, phrase extraction, named entity recognition (NER), and semantic query expansion. To demonstrate the query understanding concept, we utilized NASA JPL's Physical Oceanography Distributed Archive Center (PO.DAAC) metadata and logs and found the query understanding tool can improve the search precision, recall, and ranking.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN51A..07L
- Keywords:
-
- 1916 Data and information discovery;
- INFORMATICSDE: 1930 Data and information governance;
- INFORMATICSDE: 1946 Metadata;
- INFORMATICSDE: 1976 Software tools and services;
- INFORMATICS