Skip to content
The loss curve

Block size

Maximum context length the model attends over during training. Picks how many tokens of history it can see.