Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws
Abstract
General Data Protection Regulation (GDPR) becomes a standard law for data protection in many countries. Currently, twelve countries adopt the regulation and establish their GDPR-like regulation. However, to evaluate the differences and similarities of these GDPR-like regulations is time-consuming and needs a lot of manual effort from legal experts. Moreover, GDPR-like regulations from different countries are written in their languages leading to a more difficult task since legal experts who know both languages are essential. In this paper, we investigate a simple natural language processing (NLP) approach to tackle the problem. We first extract chunks of information from GDPR-like documents and form structured data from natural language. Next, we use NLP methods to compare documents to measure their similarity. Finally, we manually label a small set of data to evaluate our approach. The empirical result shows that the BERT model with cosine similarity outperforms other baselines. Our data and code are publicly available.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2021
- DOI:
- 10.48550/arXiv.2105.10117
- arXiv:
- arXiv:2105.10117
- Bibcode:
- 2021arXiv210510117K
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Information Retrieval
- E-Print:
- This is a preliminary work, see repo at https://github.com/kornosk/GDPR-similarity-comparison