Detecting Correlations with Little Memory and Communication
Abstract
We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. We prove a tight tradeoff between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required memory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir [2014], which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication tradeoffs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new informationtheoretic result, which may be relevant for studying other informationconstrained learning problems.
 Publication:

arXiv eprints
 Pub Date:
 March 2018
 arXiv:
 arXiv:1803.01420
 Bibcode:
 2018arXiv180301420D
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning
 EPrint:
 Accepted for presentation at Conference on Learning Theory (COLT) 2018. Changes: Added a comparison to Raz [2016]