An IR-based Approach Towards Automated Integration of Geo-spatial Datasets in Map-based Software Systems
Abstract
Data is arguably the most valuable asset of the modern world. In this era, the success of any data-intensive solution relies on the quality of data that drives it. Among vast amount of data that are captured, managed, and analyzed everyday, geospatial data are one of the most interesting class of data that hold geographical information of real-world phenomena and can be visualized as digital maps. Geo-spatial data is the source of many enterprise solutions that provide local information and insights. In order to increase the quality of such solutions, companies continuously aggregate geospatial datasets from various sources. However, lack of a global standard model for geospatial datasets makes the task of merging and integrating datasets difficult and error-prone. Traditionally, domain experts manually validate the data integration process by merging new data sources and/or new versions of previous data against conflicts and other requirement violations. However, this approach is not scalable and is hinder toward rapid release, when dealing with frequently changing big datasets. Thus more automated approaches with limited interaction with domain experts is required. As a first step to tackle this problem, in this paper, we leverage Information Retrieval (IR) and geospatial search techniques to propose a systematic and automated conflict identification approach. To evaluate our approach, we conduct a case study in which we measure the accuracy of our approach in several real-world scenarios and we interview with software developers at Localintel Inc. (our industry partner) to get their feedbacks.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2019
- DOI:
- 10.48550/arXiv.1906.06331
- arXiv:
- arXiv:1906.06331
- Bibcode:
- 2019arXiv190606331M
- Keywords:
-
- Computer Science - Databases
- E-Print:
- ESEC/FSE 2019 - Industry track