Skip to content
The loss curve

Parameter

One of the model's learnable numbers. Modern LLMs have billions to trillions; the bigram model in chapter 1 has |vocab|² of them (one per cell of the counts table).