Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Inside–outside algorithm
Parameter estimation method for probabilistic context-free grammars

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

We don't have any images related to Inside–outside algorithm yet.
We don't have any YouTube videos related to Inside–outside algorithm yet.
We don't have any PDF documents related to Inside–outside algorithm yet.
We don't have any Books related to Inside–outside algorithm yet.
We don't have any archived web articles related to Inside–outside algorithm yet.

Inside and outside probabilities

The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is the total probability of generating words w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} , given the root nonterminal N j {\displaystyle N^{j}} and a grammar G {\displaystyle G} :1

β j ( p , q ) = P ( w p q | N p q j , G ) {\displaystyle \beta _{j}(p,q)=P(w_{pq}|N_{pq}^{j},G)}

The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the total probability of beginning with the start symbol N 1 {\displaystyle N^{1}} and generating the nonterminal N p q j {\displaystyle N_{pq}^{j}} and all the words outside w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} , given a grammar G {\displaystyle G} :2

α j ( p , q ) = P ( w 1 ( p − 1 ) , N p q j , w ( q + 1 ) m | G ) {\displaystyle \alpha _{j}(p,q)=P(w_{1(p-1)},N_{pq}^{j},w_{(q+1)m}|G)}

Computing inside probabilities

Base Case:

β j ( p , p ) = P ( w p | N j , G ) {\displaystyle \beta _{j}(p,p)=P(w_{p}|N^{j},G)}

General case:

Suppose there is a rule N j → N r N s {\displaystyle N_{j}\rightarrow N_{r}N_{s}} in the grammar, then the probability of generating w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} starting with a subtree rooted at N j {\displaystyle N_{j}} is:

∑ k = p k = q − 1 P ( N j → N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is just the sum over all such possible rules:

β j ( p , q ) = ∑ N r , N s ∑ k = p k = q − 1 P ( N j → N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \beta _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}

Computing outside probabilities

Base Case:

α j ( 1 , n ) = { 1 if  j = 1 0 otherwise {\displaystyle \alpha _{j}(1,n)={\begin{cases}1&{\mbox{if }}j=1\\0&{\mbox{otherwise}}\end{cases}}}

Here the start symbol is N 1 {\displaystyle N_{1}} .

General case:

Suppose there is a rule N r → N j N s {\displaystyle N_{r}\rightarrow N_{j}N_{s}} in the grammar that generates N j {\displaystyle N_{j}} . Then the left contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:

∑ k = q + 1 k = n P ( N r → N j N s ) α r ( p , k ) β s ( q + 1 , k ) {\displaystyle \sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)}

Now suppose there is a rule N r → N s N j {\displaystyle N_{r}\rightarrow N_{s}N_{j}} in the grammar. Then the right contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:

∑ k = 1 k = p − 1 P ( N r → N s N j ) α r ( k , q ) β s ( k , p − 1 ) {\displaystyle \sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the sum of the left and right contributions over all such rules:

α j ( p , q ) = ∑ N r , N s ∑ k = q + 1 k = n P ( N r → N j N s ) α r ( p , k ) β s ( q + 1 , k ) + ∑ N r , N s ∑ k = 1 k = p − 1 P ( N r → N s N j ) α r ( k , q ) β s ( k , p − 1 ) {\displaystyle \alpha _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)+\sum _{N_{r},N_{s}}\sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}

References

  1. Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1. 0-262-13360-1

  2. Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1. 0-262-13360-1