People Mover's Distance: Class level geometry using fast pairwise data adaptive transportation costs
Abstract
We address the problem of defining a network graph on a large collection of classes. Each class is comprised of a collection of data points, sampled in a non i.i.d. way, from some unknown underlying distribution. The application we consider in this paper is a large scale high dimensional survey of people living in the US, and the question of how similar or different are the various counties in which these people live. We use a coclustering diffusion metric to learn the underlying distribution of people, and build an approximate earth mover's distance algorithm using this data adaptive transportation cost.
 Publication:

arXiv eprints
 Pub Date:
 July 2017
 arXiv:
 arXiv:1707.00514
 Bibcode:
 2017arXiv170700514C
 Keywords:

 Statistics  Machine Learning;
 Statistics  Applications