Skip to content
The loss curve

Softmax

Normalizes a vector of real numbers into a probability distribution: exp(x_i) / Σ exp(x_j). Used at the end of attention and at the model's output.