Learning rate
Scalar that scales each gradient-descent step. Too small and training crawls; too large and the loss diverges. The single most-important hyperparameter.
Continue
Scalar that scales each gradient-descent step. Too small and training crawls; too large and the loss diverges. The single most-important hyperparameter.
Continue