Anonymously Analyzing Clinical Datasets
Abstract
This paper takes on the problem of automatically identifying clinically-relevant patterns in medical datasets without compromising patient privacy. To achieve this goal, we treat datasets as a black box for both internal and external users of data that lets us handle clinical data queries directly and far more efficiently. The novelty of the approach lies in avoiding the data de-identification process often used as a means of preserving patient privacy. The implemented toolkit combines software engineering technologies such as Java EE and RESTful web services, to allow exchanging medical data in an unidentifiable XML format as well as restricting users to the need-to-know principle. Our technique also inhibits retrospective processing of data, such as attacks by an adversary on a medical dataset using advanced computational methods to reveal Protected Health Information (PHI). The approach is validated on an endoscopic reporting application based on openEHR and MST standards. From the usability perspective, the approach can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care services to improve quality of care.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2014
- DOI:
- 10.48550/arXiv.1501.05916
- arXiv:
- arXiv:1501.05916
- Bibcode:
- 2015arXiv150105916Q
- Keywords:
-
- Computer Science - Software Engineering;
- Computer Science - Cryptography and Security;
- Computer Science - Databases