Byte-Pair Encoding (BPE)
Tokenization scheme that starts with characters and iteratively merges the most-frequent adjacent pair. Produces subword tokens.
Originally a 1994 data-compression algorithm. Used in GPT-2/3/4 and most modern LLMs. The merges discover morphology — suffixes like 'ing' or 'ed' emerge naturally as common subwords.
Continue