A multi-language toolkit for supporting automated checking of research outputs
Abstract
This article presents the automatic checking of research outputs package acro, which assists researchers and data governance teams by automatically applying best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. acro distinguishes between: research output that is safe to publish; output that requires further analysis; and output that cannot be published because it creates substantial risk of disclosing private data. This is achieved through the use of a lightweight Python wrapper that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This adds functionality to (i) identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply disclosure mitigation strategies where required; (iii) report reasons for applying SDC; and (iv) produce simple summary documents trusted research environment staff can use to streamline their workflow. The major analytical programming languages used by researchers are supported: Python, R, and Stata. The acro code and documentation are available under an MIT license at https://github.com/AI-SDC/ACRO
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2022
- DOI:
- 10.48550/arXiv.2212.02935
- arXiv:
- arXiv:2212.02935
- Bibcode:
- 2022arXiv221202935P
- Keywords:
-
- Computer Science - Cryptography and Security;
- Computer Science - Information Retrieval;
- Computer Science - Software Engineering;
- Statistics - Applications;
- Statistics - Methodology