Stochastic Optimization of Sorting Networks via Continuous Relaxations
Abstract
Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is non-differentiable with respect to its inputs, which prohibits end-to-end gradient-based optimization. In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct arg max. This relaxation permits straight-through optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradient-based stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the Plackett-Luce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require learning semantic orderings of high-dimensional objects, including a fully differentiable, parameterized extension of the k-nearest neighbors algorithm.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2019
- DOI:
- arXiv:
- arXiv:1903.08850
- Bibcode:
- 2019arXiv190308850G
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing
- E-Print:
- ICLR 2019