Skip to content
The loss curve

GELU

Smooth approximation of ReLU used in transformers. Roughly x·Φ(x) where Φ is the Gaussian CDF.