The ClearEarth Project: Preliminary Findings from Experiments in Applying the CLEARTK NLP Pipeline and Annotation Tools Developed for Biomedicine to the Earth Sciences
Abstract
The ability to quickly find, easily use and effortlessly integrate data from a variety of sources is a grand challenge in Earth sciences, one around which entire research programs have been built. A myriad of approaches to tackling components of this challenge have been demonstrated, often with some success. Yet finding, assessing, accessing, using and integrating data remains a major challenge for many researchers. A technology that has shown promise in nearly every aspect of the challenge is semantics. Semantics has been shown to improve data discovery, facilitate assessment of a data set, and through adoption of the W3C's Linked Data Platform to have improved data integration and use at least for data amenable to that paradigm. Yet the creation of semantic resources has been slow. Why? Amongst a plethora of other reasons, it is because semantic expertise is rare in the Earth and Space sciences; the creation of semantic resources for even a single discipline is labor intensive and requires agreement within the discipline; best practices, methods and tools for supporting the creation and maintenance of the resources generated are in flux; and the human and financial capital needed are rarely available in the Earth sciences. However, other fields, such as biomedicine, have made considerable progress in these areas. The NSF-funded ClearEarth project is adapting the methods and tools from these communities for the Earth sciences in the expectation that doing so will enhance progress and the rate at which the needed semantic resources are created. We discuss progress and results to date, lessons learned from this adaptation process, and describe our upcoming efforts to extend this knowledge to the next generation of Earth and data scientists.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2016
- Bibcode:
- 2016AGUFMIN11B1625D
- Keywords:
-
- 1914 Data mining;
- INFORMATICSDE: 1932 High-performance computing;
- INFORMATICSDE: 1942 Machine learning;
- INFORMATICSDE: 1980 Spatial analysis and representation;
- INFORMATICS