Smoothing

Family of techniques that ensure every possible token transition gets a positive probability, even ones never seen during training.

Without smoothing, an n-gram model collapses to perplexity = ∞ as soon as it sees an unseen transition on the validation set. Laplace add-α and Kneser-Ney are the two classic methods.

Continue

← All terms Browse chapters