Smoothing
Family of techniques that ensure every possible token transition gets a positive probability, even ones never seen during training.
Without smoothing, an n-gram model collapses to perplexity = ∞ as soon as it sees an unseen transition on the validation set. Laplace add-α and Kneser-Ney are the two classic methods.
Continue