Knowledge Graph for Microdata of Statistics Netherlands
Abstract
Statistics Netherlands (CBS) hosted a huge amount of data not only on the statistical level but also on the individual level. With the development of data science technologies, more and more researchers request to conduct their research by using high-quality individual data from CBS (called CBS Microdata) or combining them with other data sources. Making great use of these data for research and scientific purposes can tremendously benefit the whole society. However, CBS Microdata has been collected and maintained in different ways by different departments in and out of CBS. The representation, quality, metadata of datasets are not sufficiently harmonized. The project converts the descriptions of all CBS microdata sets into one knowledge graph with comprehensive metadata in Dutch and English using text mining and semantic web technologies. Researchers can easily query the metadata, explore the relations among multiple datasets, and find the needed variables. For example, if a researcher searches a dataset about "Age at Death" in the Health and Well-being category, all information related to this dataset will appear including keywords and variable names. "Age at Death" dataset has a keyword - "Death". This keyword will lead to other datasets such as "Date of Death". "Cause of Death", "Production statistics Health and welfare" from Population, Business categories, and Health and well-being categories. This will tremendously save time and costs for the data requester but also data maintainers.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2021
- DOI:
- 10.48550/arXiv.2101.07622
- arXiv:
- arXiv:2101.07622
- Bibcode:
- 2021arXiv210107622S
- Keywords:
-
- Computer Science - Digital Libraries;
- Computer Science - Databases