Learning rate

Scalar that scales each gradient-descent step. Too small and training crawls; too large and the loss diverges. The single most-important hyperparameter.

Continue

← All terms Browse chapters