Word n-gram language model

A word n-gram language model is a purely statistical model of language. It has been superseded by <a href="/facts/Recurrent_neural_network/bx7hBVB1">recurrent neural network</a>–based models, which have been superseded by <a href="/facts/Large_language_model/WnogWVJY">large language models</a>. It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if n − 1 words, an n-gram model. Special tokens are introduced to denote the start and end of a sentence 
 
 
 
 ⟨
 s
 ⟩
 
 
 {\displaystyle \langle s\rangle }
 
 and 
 
 
 
 ⟨
 
 /
 
 s
 ⟩
 
 
 {\displaystyle \langle /s\rangle }
 
.
To prevent a zero probability being assigned to unseen words, each word's probability is slightly lower than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen n-grams, as an <a href="/facts/Uninformative_prior/JQKAD4o0">uninformative prior</a>) to more sophisticated models, such as <a href="/facts/Good%25E2%2580%2593Turing_discounting/W9K7KYk8">Good–Turing discounting</a> or <a href="/facts/Katz%2527s_back-off_model/zHvkpYan">back-off models</a>.

Word n-gram language model open-in-new

Word n-gram language model