Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Abstract
In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2023
- DOI:
- arXiv:
- arXiv:2311.14387
- Bibcode:
- 2023arXiv231114387W
- Keywords:
-
- Computer Science - Machine Learning;
- Mathematics - Optimization and Control
- E-Print:
- 37 pages, accepted by ICML 2024