0

Gaussian Error Linear Units (GELUs)

GELU, a new activation function, outperforms ReLU and ELU across various computer vision, NLP, and speech tasks.

Year
2016
Venue
arXiv 2016
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/1606.08415v5ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Authors

2