Skip to content
The loss curve

Language model

A model that assigns a probability to the next token given the tokens that came before. Generation is repeated sampling from that probability.

Every modern LLM is a language model: input is a sequence of tokens, output is a probability distribution over the vocabulary. The bigram in chapter 1, the transformer in chapter 10, and GPT-4 all share that interface — only the function in the middle differs.