Efficient gradient-based methods for bilevel learning via recycling Krylov subspaces
Abstract
Many optimization problems require hyperparameters, i.e., parameters that must be pre-specified in advance, such as regularization parameters and parametric regularizers in variational regularization methods for inverse problems, and dictionaries in compressed sensing. A data-driven approach to determine appropriate hyperparameter values is via a nested optimization framework known as bilevel learning. Even when it is possible to employ a gradient-based solver to the bilevel optimization problem, construction of the gradients, known as hypergradients, is computationally challenging, each one requiring both a solution of a minimization problem and a linear system solve. These systems do not change much during the iterations, which motivates us to apply recycling Krylov subspace methods, wherein information from one linear system solve is re-used to solve the next linear system. Existing recycling strategies often employ eigenvector approximations called Ritz vectors. In this work we propose a novel recycling strategy based on a new concept, Ritz generalized singular vectors, which acknowledge the bilevel setting. Additionally, while existing iterative methods primarily terminate according to the residual norm, this new concept allows us to define a new stopping criterion that directly approximates the error of the associated hypergradient. The proposed approach is validated through extensive numerical testing in the context of an inverse problem in imaging.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.08264
- Bibcode:
- 2024arXiv241208264E
- Keywords:
-
- Mathematics - Optimization and Control;
- Mathematics - Numerical Analysis;
- 65F10
- E-Print:
- 27 pages, 11 figures