Semi-automatic Data Integration using Karma
Abstract
Data integration applications are ubiquitous in scientific disciplines. A state-of-the-art data integration system accepts both a set of data sources and a target ontology as input, and semi-automatically maps the data sources in terms of concepts and relationships in the target ontology. Mappings can be both complex and highly domain-specific. Once such a semantic model, expressing the mapping using community-wide standard, is acquired, the source data can be stored in a single repository or database using the semantics of the target ontology. However, acquiring the mapping is a labor-prone process, and state-of-the-art artificial intelligence systems are unable to fully automate the process using heuristics and algorithms alone. Instead, a more realistic goal is to develop adaptive tools that minimize user feedback (e.g., by offering good mapping recommendations), while at the same time making it intuitive and easy for the user to both correct errors and to define complex mappings. We present Karma, a data integration system that has been developed over multiple years in the information integration group at the Information Sciences Institute, a research institute at the University of Southern California's Viterbi School of Engineering. Karma is a state-of-the-art data integration tool that supports an interactive graphical user interface, and has been featured in multiple domains over the last five years, including geospatial, biological, humanities and bibliographic applications. Karma allows a user to import their own ontology and datasets using widely used formats such as RDF, XML, CSV and JSON, can be set up either locally or on a server, supports a native backend database for prototyping queries, and can even be seamlessly integrated into external computational pipelines, including those ingesting data via streaming data sources, Web APIs and SQL databases. We illustrate a Karma workflow at a conceptual level, along with a live demo, and show use cases of Karma specifically for the geosciences. In particular, we show how Karma can be used intuitively to obtain the mapping model between case study data sources and a publicly available and expressive target ontology that has been designed to capture a broad set of concepts in geoscience with standardized, easily searchable names.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2017
- Bibcode:
- 2017AGUFMIN13A0053G
- Keywords:
-
- 1920 Emerging informatics technologies;
- INFORMATICS;
- 1968 Scientific reasoning/inference;
- INFORMATICS;
- 1976 Software tools and services;
- INFORMATICS;
- 1999 General or miscellaneous;
- INFORMATICS