Lessons learned in deploying a cloud-based knowledge platform for the Earth Science Information Partners Federation (ESIP)
Abstract
Ontologies and semantic technologies are an essential infrastructure component of systems supporting knowledge integration in the Earth Sciences. Numerous earth science ontologies exist, but are hard to discover because they tend to be hosted with the projects that develop them. There are often few quality measures and sparse metadata associated with these ontologies, such as modification dates, versioning, purpose, number of classes, and properties. Projects often develop ontologies for their own needs without considering existing ontology entities or derivations from formal and more basic ontologies. The result is mostly orthogonal ontologies, and ontologies that are not modular enough to reuse in part or adapt for new purposes, in spite of existing, standards for ontology representation. Additional obstacles to sharing and reuse include a lack of maintenance once a project is completed. The obstacles prevent the full exploitation of semantic technologies in a context where they could become needed enablers for service discovery and for matching data with services. To start addressing this gap, we have deployed BioPortal, a mature, domain-independent ontology and semantic service system developed by the National Center for Biomedical Ontologies (NCBO), on the ESIP Testbed under the governance of the ESIP Semantic Web cluster. ESIP provides a forum for a broad-based, distributed community of data and information technology practitioners and stakeholders to coordinate their efforts and develop new ideas for interoperability solutions. The Testbed provides an environment where innovations and best practices can be explored and evaluated. One objective of this deployment is to provide a community platform that would harness the organizational and cyber infrastructure provided by ESIP at minimal costs. Another objective is to host ontology services on a scalable, public cloud and investigate the business case for crowd sourcing of ontology maintenance. We deployed the system on Amazon 's Elastic Compute Cloud (EC2) where ESIP maintains an account. Our approach had three phases: 1) set up a private cloud environment at the University of South Carolina to become familiar with the complex architecture of the system and enable some basic customization, 2) coordinate the production of a Virtual Appliance for the system with NCBO and deploy it on the Amazon cloud, and 3) outreach to the ESIP community to solicit participation, populate the repository, and develop new use cases. Phase 2 is nearing completion and Phase 3 is underway. Ontologies were gathered during updates to the ESIP cluster. Discussion points included the criteria for a shareable ontology and how to determine the best size for an ontology to be reusable. Outreach highlighted that the system can start addressing an integration of discovery frameworks via linking data and services in a pull model (data and service casting), a key issue of the Discovery cluster. This work thus presents several contributions: 1) technology injection from another domain into the earth sciences, 2) the deployment of a mature knowledge platform on the EC2 cloud, and 3) the successful engagement of the community through the ESIP clusters and Testbed model.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2012
- Bibcode:
- 2012AGUFMIN51A1680P
- Keywords:
-
- 1902 INFORMATICS / Community modeling frameworks;
- 1958 INFORMATICS / Ontologies;
- 1970 INFORMATICS / Semantic web and semantic integration;
- 1978 INFORMATICS / Software re-use