Training
Adjusting a model's parameters so that it does its job better on a given dataset. For a language model, that usually means lowering next-token loss across the training set.
Training a counts-table bigram is just incrementing counters. Training a neural network is running gradient descent on millions to trillions of parameters. The underlying loop — measure how wrong you are, change something to be less wrong, repeat — is the same.
Continue