Extended convexity and smoothness and their applications in deep learning
Abstract
The underlying mechanism by which simple gradient-based iterative algorithms can effectively handle the non-convex problem of deep model training remains incompletely understood within the traditional convex and non-convex analysis frameworks, which often require the Lipschitz smoothness of the gradient and strong convexity. In this paper, we introduce $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness, which broaden the existing concepts of smoothness and convexity, and delineate their fundamental properties. Building on these concepts, we introduce the high-order gradient descent and high-order stochastic gradient descent methods, which serve as extensions to the traditional gradient descent and stochastic gradient descent methods, respectively. Furthermore, we establish descent lemmas for the $\mathcal{H}(\phi)$-convex and $\mathcal{H}(\Phi)$-smooth objective functions when utilizing these four methods. On the basis of these findings, we develop the gradient structure control algorithm to address non-convex optimization objectives, encompassing both the functions represented by machine learning models and common loss functions in deep learning. The effectiveness of the proposed methodology is empirically validated through experiments.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2024
- DOI:
- 10.48550/arXiv.2410.05807
- arXiv:
- arXiv:2410.05807
- Bibcode:
- 2024arXiv241005807Q
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Data Structures and Algorithms;
- Mathematics - Optimization and Control