Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

doi:10.48550/arXiv.2303.04143

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

Publication:

arXiv e-prints

Pub Date:

March 2023

DOI:

10.48550/arXiv.2303.04143

arXiv:

arXiv:2303.04143

Bibcode:

2023arXiv230304143K

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computer Vision and Pattern Recognition;
Statistics - Machine Learning

E-Print:

ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

NASA/ADS

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Abstract