A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data
Federal administrative tax data are invaluable for research, but because of privacy concerns, access to these data is typically limited to select agencies and a few individuals. An alternative to sharing microlevel data is a validation server, which allows individuals to query statistics without directly accessing the confidential data. This paper studies the feasibility of using differentially private (DP) methods to implement such a server. We provide an extensive study on existing DP methods for releasing tabular statistics, means, quantiles, and regression estimates. We also include new methodological adaptations to existing DP regression methods for using new data types and returning standard error estimates. We evaluate the selected methods based on the accuracy of the output for statistical analyses, using real administrative tax data obtained from the Internal Revenue Service. Our findings show that a validation server is feasible for simple, univariate statistics but struggles to produce accurate regression estimates and confidence intervals. We outline challenges and offer recommendations for future work on validation server frameworks. This is the first comprehensive statistical study of DP regression methodology on a real, complex dataset, that has significant implications for the direction of a growing research field and public policy.