Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization
Abstract
We present PSiLON Net, an MLP architecture that uses $L_1$ weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers, we conduct experiments in the small data regime using overparameterized PSiLON Nets and PSiLON ResNets, demonstrating reliable optimization and strong performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- 10.48550/arXiv.2404.19112
- arXiv:
- arXiv:2404.19112
- Bibcode:
- 2024arXiv240419112B
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 8 pages body, 2 tables, 1 figure, 3 appendices