Softmax
Normalizes a vector of real numbers into a probability distribution: exp(x_i) / Σ exp(x_j). Used at the end of attention and at the model's output.
Continue
Normalizes a vector of real numbers into a probability distribution: exp(x_i) / Σ exp(x_j). Used at the end of attention and at the model's output.
Continue