Optimisation &amp; Generalisation in Networks of Neurons

doi:10.48550/arXiv.2210.10101

Optimisation & Generalisation in Networks of Neurons

Bernstein, Jeremy

The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. On optimisation, a new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms. The approach works by combining a "functional majorisation" of the loss function with "architectural perturbation bounds" that encode an explicit dependence on neural architecture. The framework yields optimisation methods that transfer hyperparameters across learning problems. On generalisation, a new correspondence is proposed between ensembles of networks and individual networks. It is argued that, as network width and normalised margin are taken large, the space of networks that interpolate a particular training set concentrates on an aggregated Bayesian method known as a "Bayes point machine". This correspondence provides a route for transferring PAC-Bayesian generalisation theorems over to individual networks. More broadly, the correspondence presents a fresh perspective on the role of regularisation in networks with vastly more parameters than data.

Publication:

arXiv e-prints

Pub Date:

October 2022

DOI:

10.48550/arXiv.2210.10101

arXiv:

arXiv:2210.10101

Bibcode:

2022arXiv221010101B

Keywords:

Computer Science - Neural and Evolutionary Computing;
Computer Science - Artificial Intelligence;
Computer Science - Information Theory;
Computer Science - Machine Learning;
Mathematics - Numerical Analysis

E-Print:

PhD thesis

NASA/ADS

Optimisation & Generalisation in Networks of Neurons

Abstract