A Methodology for Empirical Analysis of LOD Datasets
Abstract
CoCoE stands for Complexity, Coherence and Entropy, and presents an extensible methodology for empirical analysis of Linked Open Data (i.e., RDF graphs). CoCoE can offer answers to questions like: Is dataset A better than B for knowledge discovery since it is more complex and informative?, Is dataset X better than Y for simple value lookups due its flatter structure?, etc. In order to address such questions, we introduce a set of well-founded measures based on complementary notions from distributional semantics, network analysis and information theory. These measures are part of a specific implementation of the CoCoE methodology that is available for download. Last but not least, we illustrate CoCoE by its application to selected biomedical RDF datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2014
- DOI:
- 10.48550/arXiv.1406.1061
- arXiv:
- arXiv:1406.1061
- Bibcode:
- 2014arXiv1406.1061N
- Keywords:
-
- Computer Science - Artificial Intelligence;
- Computer Science - Social and Information Networks
- E-Print:
- A current working draft of the paper submitted to the ISWC'14 conference (track information available here: http://iswc2014.semanticweb.org/call-replication-benchmark-data-software-papers)