Bigram
A pair of consecutive tokens. The simplest unit of context a language model can use.
A bigram model assigns a probability to a token based only on the token immediately before it: P(w_t | w_{t-1}). It cannot capture anything past one position back, but it's already a working language model.
Continue