Fast Feedforward Networks

doi:10.48550/arXiv.2308.14711

Fast Feedforward Networks

We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execution. Pushing FFFs to the limit, we show that they can use as little as 1% of layer neurons for inference in vision transformers while preserving 94.2% of predictive performance.

Publication:

arXiv e-prints

Pub Date:

August 2023

DOI:

10.48550/arXiv.2308.14711

arXiv:

arXiv:2308.14711

Bibcode:

2023arXiv230814711B

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Performance

E-Print:

12 pages, 6 figures, 4 tables

NASA/ADS

Fast Feedforward Networks

Abstract