Skip to content
The loss curve

Sampling

Drawing a random value from a probability distribution. In a language model, picking the next token by rolling a number against the model's output.

Sampling strategies (greedy, temperature, top-k, top-p) all answer the same question — given a probability distribution over the vocabulary, which token do we actually emit? — but trade off determinism for diversity differently.