Scaling law

Empirical observation that model quality improves predictably with parameters, training tokens, and compute. The Chinchilla rule: optimal tokens ≈ 20 × parameters.

Continue

← All terms Browse chapters