Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Scoring algorithm
Form of Newton's method used in statistics

Scoring algorithm, also known as Fisher's scoring, is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

We don't have any images related to Scoring algorithm yet.
We don't have any YouTube videos related to Scoring algorithm yet.
We don't have any PDF documents related to Scoring algorithm yet.
We don't have any Books related to Scoring algorithm yet.
We don't have any archived web articles related to Scoring algorithm yet.

Sketch of derivation

Let Y 1 , … , Y n {\displaystyle Y_{1},\ldots ,Y_{n}} be random variables, independent and identically distributed with twice differentiable p.d.f. f ( y ; θ ) {\displaystyle f(y;\theta )} , and we wish to calculate the maximum likelihood estimator (M.L.E.) θ ∗ {\displaystyle \theta ^{*}} of θ {\displaystyle \theta } . First, suppose we have a starting point for our algorithm θ 0 {\displaystyle \theta _{0}} , and consider a Taylor expansion of the score function, V ( θ ) {\displaystyle V(\theta )} , about θ 0 {\displaystyle \theta _{0}} :

V ( θ ) ≈ V ( θ 0 ) − J ( θ 0 ) ( θ − θ 0 ) , {\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}

where

J ( θ 0 ) = − ∑ i = 1 n ∇ ∇ ⊤ | θ = θ 0 log ⁡ f ( Y i ; θ ) {\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}

is the observed information matrix at θ 0 {\displaystyle \theta _{0}} . Now, setting θ = θ ∗ {\displaystyle \theta =\theta ^{*}} , using that V ( θ ∗ ) = 0 {\displaystyle V(\theta ^{*})=0} and rearranging gives us:

θ ∗ ≈ θ 0 + J − 1 ( θ 0 ) V ( θ 0 ) . {\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}

We therefore use the algorithm

θ m + 1 = θ m + J − 1 ( θ m ) V ( θ m ) , {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}

and under certain regularity conditions, it can be shown that θ m → θ ∗ {\displaystyle \theta _{m}\rightarrow \theta ^{*}} .

Fisher scoring

In practice, J ( θ ) {\displaystyle {\mathcal {J}}(\theta )} is usually replaced by I ( θ ) = E [ J ( θ ) ] {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]} , the Fisher information, thus giving us the Fisher Scoring Algorithm:

θ m + 1 = θ m + I − 1 ( θ m ) V ( θ m ) {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})} ..

Under some regularity conditions, if θ m {\displaystyle \theta _{m}} is a consistent estimator, then θ m + 1 {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.2

See also

Further reading

References

  1. Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika. 74 (4): 817–827. doi:10.1093/biomet/74.4.817. /wiki/Doi_(identifier)

  2. Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics, New York, NY: Springer New York, Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, S2CID 239322258, retrieved 2023-01-03 978-1-4939-9759-6