Full-Information Estimation For Hierarchical Data
Abstract
The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements, which are population tabulations added to realizations of mean-zero random variables. These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks. The noisy measurements from the 2020 Redistricting Data File and Demographic and Housing Characteristics File statistical data products are now public. The purpose of this paper is to describe a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations and in arbitrary geographic entities composed of census blocks. This method is based on computing a weighted least squares estimator (WLS) and its variance matrix. Due to the high dimension of this estimator, this operation is not feasible using the standard approach, since this would require evaluating products with the inverse of a dense matrix with several billion (or even several trillion) rows and columns. In contrast, the approach we describe in this paper computes the required estimate and its variance with a time complexity and memory requirement that scales linearly in the number of census blocks.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- 10.48550/arXiv.2404.13164
- arXiv:
- arXiv:2404.13164
- Bibcode:
- 2024arXiv240413164C
- Keywords:
-
- Statistics - Computation;
- Computer Science - Cryptography and Security