Scoring algorithm - Reference.org

On this page

Scoring algorithm

Form of Newton's method used in statistics

Scoring algorithm, also known as Fisher's scoring, is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

We don't have any images related to Scoring algorithm yet.

You can add one yourself here.

We don't have any YouTube videos related to Scoring algorithm yet.

You can add one yourself here.

We don't have any PDF documents related to Scoring algorithm yet.

You can add one yourself here.

We don't have any Books related to Scoring algorithm yet.

You can add one yourself here.

We don't have any archived web articles related to Scoring algorithm yet.

You can submit a link to a page to archive here.

Sketch of derivation

Let Y 1 , … , Y n {\displaystyle Y_{1},\ldots ,Y_{n}} be random variables, independent and identically distributed with twice differentiable p.d.f. f ( y ; θ ) {\displaystyle f(y;\theta )} , and we wish to calculate the maximum likelihood estimator (M.L.E.) θ ∗ {\displaystyle \theta ^{*}} of θ {\displaystyle \theta } . First, suppose we have a starting point for our algorithm θ 0 {\displaystyle \theta _{0}} , and consider a Taylor expansion of the score function, V ( θ ) {\displaystyle V(\theta )} , about θ 0 {\displaystyle \theta _{0}} :

V ( θ ) ≈ V ( θ 0 ) − J ( θ 0 ) ( θ − θ 0 ) , {\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}

where

J ( θ 0 ) = − ∑ i = 1 n ∇ ∇ ⊤ | θ = θ 0 log ⁡ f ( Y i ; θ ) {\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}

is the observed information matrix at θ 0 {\displaystyle \theta _{0}} . Now, setting θ = θ ∗ {\displaystyle \theta =\theta ^{*}} , using that V ( θ ∗ ) = 0 {\displaystyle V(\theta ^{*})=0} and rearranging gives us:

θ ∗ ≈ θ 0 + J − 1 ( θ 0 ) V ( θ 0 ) . {\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}

We therefore use the algorithm

θ m + 1 = θ m + J − 1 ( θ m ) V ( θ m ) , {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}

and under certain regularity conditions, it can be shown that θ m → θ ∗ {\displaystyle \theta _{m}\rightarrow \theta ^{*}} .

Fisher scoring

In practice, J ( θ ) {\displaystyle {\mathcal {J}}(\theta )} is usually replaced by I ( θ ) = E [ J ( θ ) ] {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]} , the Fisher information, thus giving us the Fisher Scoring Algorithm:

θ m + 1 = θ m + I − 1 ( θ m ) V ( θ m ) {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})} ..

Under some regularity conditions, if θ m {\displaystyle \theta _{m}} is a consistent estimator, then θ m + 1 {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.²

References

Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika. 74 (4): 817–827. doi:10.1093/biomet/74.4.817. /wiki/Doi_(identifier) ↩
Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference", Springer Texts in Statistics, New York, NY: Springer New York, Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, S2CID 239322258, retrieved 2023-01-03 978-1-4939-9759-6 ↩

Sketch of derivation

Fisher scoring

See also

Further reading

References