Gaussian Error Linear Units (GELUs)

doi:10.48550/arXiv.1606.08415

Gaussian Error Linear Units (GELUs)

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Publication:

arXiv e-prints

Pub Date:

June 2016

DOI:

10.48550/arXiv.1606.08415

arXiv:

arXiv:1606.08415

Bibcode:

2016arXiv160608415H

Keywords:

Computer Science - Machine Learning

E-Print:

Trimmed version of 2016 draft

NASA/ADS

Gaussian Error Linear Units (GELUs)

Abstract