Language model
A model that assigns a probability to the next token given the tokens that came before. Generation is repeated sampling from that probability.
Every modern LLM is a language model: input is a sequence of tokens, output is a probability distribution over the vocabulary. The bigram in chapter 1, the transformer in chapter 10, and GPT-4 all share that interface — only the function in the middle differs.
Continue