Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences
Abstract
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented under mild conditions on $f$. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function $f$ are presented. Previously known solutions to common regularization choices are obtained by leveraging the flexibility of the family of $f$-divergences. These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of $f$-divergences when used in the ERM-$f$DR problem: $i\bigl)$ $f$-divergence regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and $ii\bigl)$ any $f$-divergence regularization is equivalent to a different $f$-divergence regularization with an appropriate transformation of the empirical risk function.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2024
- DOI:
- 10.48550/arXiv.2402.00501
- arXiv:
- arXiv:2402.00501
- Bibcode:
- 2024arXiv240200501D
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Information Theory;
- Computer Science - Machine Learning
- E-Print:
- Submitted to the IEEE Symposium in Information Theory 2024. arXiv admin note: text overlap with arXiv:2306.07123