Skip to content
The loss curve

Learning rate

Scalar that scales each gradient-descent step. Too small and training crawls; too large and the loss diverges. The single most-important hyperparameter.