Gaussian Error Linear Units (GELUs)
Abstract
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2016
- DOI:
- 10.48550/arXiv.1606.08415
- arXiv:
- arXiv:1606.08415
- Bibcode:
- 2016arXiv160608415H
- Keywords:
-
- Computer Science - Machine Learning
- E-Print:
- Trimmed version of 2016 draft