Score test

<h2 id="single-parameter-test">Single-parameter test</h2>
<h3>The statistic</h3>
<p>Let 
  
    
      
        L
      
    
    {\displaystyle L}
  
 be the <a href="/facts/Likelihood_function/1L8P4NBX">likelihood function</a> which depends on a univariate parameter 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 and let 
  
    
      
        x
      
    
    {\displaystyle x}
  
 be the data. The score 
  
    
      
        U
        (
        θ
        )
      
    
    {\displaystyle U(\theta )}
  
 is defined as
</p>

U
        (
        θ
        )
        =
        
          
            
              ∂
              log
              ⁡
              L
              (
              θ
              ∣
              x
              )
            
            
              ∂
              θ
            
          
        
        .
      
    
    {\displaystyle U(\theta )={\frac {\partial \log L(\theta \mid x)}{\partial \theta }}.}

<p>The <a href="/facts/Fisher_information/Q2JLexN9">Fisher information</a> is<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a>
</p>

I
        (
        θ
        )
        =
        −
        E
        ⁡
        
          [
          
            
              
              
                
                  
                    
                      ∂
                      
                        2
                      
                    
                    
                      ∂
                      
                        θ
                        
                          2
                        
                      
                    
                  
                
                log
                ⁡
                f
                (
                X
                ;
                θ
                )
                
              
              |
            
            
            θ
          
          ]
        
        
        ,
      
    
    {\displaystyle I(\theta )=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log f(X;\theta )\,\right|\,\theta \right]\,,}

<p>where ƒ is the probability density.
</p><p>The statistic to test 
  
    
      
        
          
            
              H
            
          
          
            0
          
        
        :
        θ
        =
        
          θ
          
            0
          
        
      
    
    {\displaystyle {\mathcal {H}}_{0}:\theta =\theta _{0}}
  
 is

S
        (
        
          θ
          
            0
          
        
        )
        =
        
          
            
              U
              (
              
                θ
                
                  0
                
              
              
                )
                
                  2
                
              
            
            
              I
              (
              
                θ
                
                  0
                
              
              )
            
          
        
      
    
    {\displaystyle S(\theta _{0})={\frac {U(\theta _{0})^{2}}{I(\theta _{0})}}}

</p><p>which has an <a href="/facts/Asymptotic_distribution/dN7jVHqm">asymptotic distribution</a> of 
  
    
      
        
          χ
          
            1
          
          
            2
          
        
      
    
    {\displaystyle \chi _{1}^{2}}
  
, when 
  
    
      
        
          
            
              H
            
          
          
            0
          
        
      
    
    {\displaystyle {\mathcal {H}}_{0}}
  
 is true. While asymptotically identical, calculating the LM statistic using the <a href="/facts/Berndt%25E2%2580%2593Hall%25E2%2580%2593Hall%25E2%2580%2593Hausman_algorithm/RXprq4w2">outer-gradient-product estimator</a> of the Fisher information matrix can lead to bias in small samples.<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a>
</p>
<h4>Note on notation</h4>
<p>Note that some texts use an alternative notation, in which the statistic 
  
    
      
        
          S
          
            ∗
          
        
        (
        θ
        )
        =
        
          
            S
            (
            θ
            )
          
        
      
    
    {\displaystyle S^{*}(\theta )={\sqrt {S(\theta )}}}
  
 is tested against a normal distribution.  This approach is equivalent and gives identical results.
</p>
<h3>As most powerful test for small deviations</h3>

(
            
              
                
                  ∂
                  log
                  ⁡
                  L
                  (
                  θ
                  ∣
                  x
                  )
                
                
                  ∂
                  θ
                
              
            
            )
          
          
            θ
            =
            
              θ
              
                0
              
            
          
        
        ≥
        C
      
    
    {\displaystyle \left({\frac {\partial \log L(\theta \mid x)}{\partial \theta }}\right)_{\theta =\theta _{0}}\geq C}

<p>where 
  
    
      
        L
      
    
    {\displaystyle L}
  
 is the <a href="/facts/Likelihood_function/1L8P4NBX">likelihood function</a>, 
  
    
      
        
          θ
          
            0
          
        
      
    
    {\displaystyle \theta _{0}}
  
 is the value of the parameter of interest under the null hypothesis, and 
  
    
      
        C
      
    
    {\displaystyle C}
  
 is a constant set depending on the size of the test desired (i.e. the probability of rejecting 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 if 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 is true; see <a href="/facts/Type_I_error/aazbF3fq">Type I error</a>).
</p><p>The score test is the most powerful test for small deviations from 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
. To see this, consider testing 
  
    
      
        θ
        =
        
          θ
          
            0
          
        
      
    
    {\displaystyle \theta =\theta _{0}}
  
 versus 
  
    
      
        θ
        =
        
          θ
          
            0
          
        
        +
        h
      
    
    {\displaystyle \theta =\theta _{0}+h}
  
.  By the <a href="/facts/Neyman%25E2%2580%2593Pearson_lemma/FdHmoH6c">Neyman–Pearson lemma</a>, the most powerful test has the form
</p>

L
              (
              
                θ
                
                  0
                
              
              +
              h
              ∣
              x
              )
            
            
              L
              (
              
                θ
                
                  0
                
              
              ∣
              x
              )
            
          
        
        ≥
        K
        ;
      
    
    {\displaystyle {\frac {L(\theta _{0}+h\mid x)}{L(\theta _{0}\mid x)}}\geq K;}

<p>Taking the log of both sides yields
</p>

log
        ⁡
        L
        (
        
          θ
          
            0
          
        
        +
        h
        ∣
        x
        )
        −
        log
        ⁡
        L
        (
        
          θ
          
            0
          
        
        ∣
        x
        )
        ≥
        log
        ⁡
        K
        .
      
    
    {\displaystyle \log L(\theta _{0}+h\mid x)-\log L(\theta _{0}\mid x)\geq \log K.}

<p>The score test follows making the substitution (by <a href="/facts/Taylor_series/M0aTuftH">Taylor series</a> expansion)
</p>

log
        ⁡
        L
        (
        
          θ
          
            0
          
        
        +
        h
        ∣
        x
        )
        ≈
        log
        ⁡
        L
        (
        
          θ
          
            0
          
        
        ∣
        x
        )
        +
        h
        ×
        
          
            (
            
              
                
                  ∂
                  log
                  ⁡
                  L
                  (
                  θ
                  ∣
                  x
                  )
                
                
                  ∂
                  θ
                
              
            
            )
          
          
            θ
            =
            
              θ
              
                0
              
            
          
        
      
    
    {\displaystyle \log L(\theta _{0}+h\mid x)\approx \log L(\theta _{0}\mid x)+h\times \left({\frac {\partial \log L(\theta \mid x)}{\partial \theta }}\right)_{\theta =\theta _{0}}}

<p>and identifying the 
  
    
      
        C
      
    
    {\displaystyle C}
  
 above with 
  
    
      
        log
        ⁡
        (
        K
        )
      
    
    {\displaystyle \log(K)}
  
.
</p>
<h3>Relationship with other hypothesis tests</h3>
<p>If the null hypothesis is true, the <a href="/facts/Likelihood-ratio_test/oFqLh3R0">likelihood ratio test</a>, the <a href="/facts/Wald_test/bVs5k173">Wald test</a>, and the Score test are asymptotically equivalent tests of hypotheses.<a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a><a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a> When testing <a href="/facts/Statistical_model/jKkT5ftm">nested models</a>, the statistics for each test then converge to a Chi-squared distribution with degrees of freedom equal to the difference in degrees of freedom in the two models. If the null hypothesis is not true, however, the statistics converge to a noncentral chi-squared distribution with possibly different noncentrality parameters.
</p>
<h2 id="multiple-parameters">Multiple parameters</h2>
<p>A more general score test can be derived when there is more than one parameter.  Suppose that 
  
    
      
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
      
    
    {\displaystyle {\widehat {\theta }}_{0}}
  
 is the <a href="/facts/Maximum_likelihood/0Yq2dpQD">maximum likelihood</a> estimate of 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 under the null hypothesis 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 while 
  
    
      
        U
      
    
    {\displaystyle U}
  
 and 
  
    
      
        I
      
    
    {\displaystyle I}
  
 are respectively, the score vector and the Fisher information matrix. Then
</p>

U
          
            T
          
        
        (
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
        )
        
          I
          
            −
            1
          
        
        (
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
        )
        U
        (
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
        )
        ∼
        
          χ
          
            k
          
          
            2
          
        
      
    
    {\displaystyle U^{T}({\widehat {\theta }}_{0})I^{-1}({\widehat {\theta }}_{0})U({\widehat {\theta }}_{0})\sim \chi _{k}^{2}}

<p>asymptotically under 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
, where 
  
    
      
        k
      
    
    {\displaystyle k}
  
 is the number of constraints imposed by the null hypothesis and
</p>

U
        (
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
        )
        =
        
          
            
              ∂
              log
              ⁡
              L
              (
              
                
                  
                    
                      θ
                      ^
                    
                  
                
                
                  0
                
              
              ∣
              x
              )
            
            
              ∂
              θ
            
          
        
      
    
    {\displaystyle U({\widehat {\theta }}_{0})={\frac {\partial \log L({\widehat {\theta }}_{0}\mid x)}{\partial \theta }}}

I
        (
        
          
            
              
                θ
                ^
              
            
          
          
            0
          
        
        )
        =
        −
        E
        ⁡
        
          (
          
            
              
                
                  ∂
                  
                    2
                  
                
                log
                ⁡
                L
                (
                
                  
                    
                      
                        θ
                        ^
                      
                    
                  
                  
                    0
                  
                
                ∣
                x
                )
              
              
                ∂
                θ
                
                ∂
                
                  θ
                  ′
                
              
            
          
          )
        
        .
      
    
    {\displaystyle I({\widehat {\theta }}_{0})=-\operatorname {E} \left({\frac {\partial ^{2}\log L({\widehat {\theta }}_{0}\mid x)}{\partial \theta \,\partial \theta '}}\right).}

<p>This can be used to test 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
.
</p><p>The actual formula for the test statistic depends on which estimator of the Fisher information matrix is being used.<a class="footnote-ref" id="fnref:10" href="#fn:10"><sup>10</sup></a>
</p>
<h2 id="special-cases">Special cases</h2>
<p>In many situations, the score statistic reduces to another commonly used statistic.<a class="footnote-ref" id="fnref:11" href="#fn:11"><sup>11</sup></a>
</p><p>In <a href="/facts/Linear_regression/5n998IhK">linear regression</a>, the Lagrange multiplier test can be expressed as a function of the <a href="/facts/F-test/Kq8lzNqY"><i>F</i>-test</a>.<a class="footnote-ref" id="fnref:12" href="#fn:12"><sup>12</sup></a>
</p><p>When the data follows a normal distribution, the score statistic is the same as the <a href="/facts/T_statistic/eTlyuoqG">t statistic</a>.
</p><p>When the data consists of binary observations, the score statistic is the same as the chi-squared statistic in the <a href="/facts/Pearson%2527s_chi-squared_test/rGtoKuU6">Pearson's chi-squared test</a>.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Fisher_information/Q2JLexN9">Fisher information</a></li>
<li><a href="/facts/Uniformly_most_powerful_test/gXCdG05j">Uniformly most powerful test</a></li>
<li><a href="/facts/Score_(statistics)/gnBI7IEy">Score (statistics)</a></li>
<li><a href="/facts/Sup-LM_test/F9ubwktl">Sup-LM test</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li>Buse, A. (1982). "The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note". <i><a href="/facts/The_American_Statistician/XJPACVAB">The American Statistician</a></i>. 36 (3a): 153–157. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1080%2F00031305.1982.10482817">10.1080/00031305.1982.10482817</a>.</li>
<li><a href="/facts/Leslie_G._Godfrey/JLowkw6x">Godfrey, L. G.</a> (1988). "The Lagrange Multiplier Test and Testing for Misspecification : An Extended Analysis". <i>Misspecification Tests in Econometrics</i>. New York: Cambridge University Press. pp. 69–99. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-521-26616-5.</li>
<li>Ma, Jun; Nelson, Charles R. (2016). "The superiority of the LM test in a class of econometric models where the Wald test performs poorly". <i>Unobserved Components and Time Series Econometrics</i>. Oxford University Press. pp. 310–330. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1093%2Facprof%3Aoso%2F9780199683666.003.0014">10.1093/acprof:oso/9780199683666.003.0014</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-19-968366-6.</li>
<li>Rao, C. R. (2005). "Score Test: Historical Review and Recent Developments". <i>Advances in Ranking and Selection, Multiple Comparisons, and Reliability</i>. Boston: Birkhäuser. pp. 3–20. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-8176-3232-8.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Rao, C. Radhakrishna (1948). "Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation". Mathematical Proceedings of the Cambridge Philosophical Society. 44 (1): 50–57. Bibcode:1948PCPS...44...50R. doi:10.1017/S0305004100023987. <a href="/wiki/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society" target="_blank">/wiki/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Silvey, S. D. (1959). "The Lagrangian Multiplier Test". Annals of Mathematical Statistics. 30 (2): 389–407. doi:10.1214/aoms/1177706259. JSTOR 2237089. <a href="https://doi.org/10.1214%2Faoms%2F1177706259" target="_blank">https://doi.org/10.1214%2Faoms%2F1177706259</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Breusch, T. S.; Pagan, A. R. (1980). "The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics". Review of Economic Studies. 47 (1): 239–253. doi:10.2307/2297111. JSTOR 2297111. <a href="/wiki/Trevor_S._Breusch" target="_blank">/wiki/Trevor_S._Breusch</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Fahrmeir, Ludwig; Kneib, Thomas; Lang, Stefan; Marx, Brian (2013). Regression : Models, Methods and Applications. Berlin: Springer. pp. 663–664. ISBN 978-3-642-34332-2. <a href="978-3-642-34332-2" target="_blank">978-3-642-34332-2</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Kennedy, Peter (1998). A Guide to Econometrics (Fourth ed.). Cambridge: MIT Press. p. 68. ISBN 0-262-11235-3. <a href="0-262-11235-3" target="_blank">0-262-11235-3</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Lehmann and Casella, eq. (2.5.16). <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Davidson, Russel; MacKinnon, James G. (1983). "Small sample properties of alternative forms of the Lagrange Multiplier test". Economics Letters. 12 (3–4): 269–275. doi:10.1016/0165-1765(83)90048-4. <a href="/wiki/Economics_Letters" target="_blank">/wiki/Economics_Letters</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>Engle, Robert F. (1983). "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics". In Intriligator, M. D.; Griliches, Z. (eds.). Handbook of Econometrics. Vol. II. Elsevier. pp. 796–801. ISBN 978-0-444-86185-6. <a href="978-0-444-86185-6" target="_blank">978-0-444-86185-6</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>Burzykowski, Andrzej Gałecki, Tomasz (2013). Linear mixed-effects models using R : a step-by-step approach. New York, NY: Springer. ISBN 978-1-4614-3899-1.{{cite book}}:  CS1 maint: multiple names: authors list (link) <a href="978-1-4614-3899-1" target="_blank">978-1-4614-3899-1</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
<li id="fn:10"><p>Taboga, Marco. "Lectures on Probability Theory and Mathematical Statistics". statlect.com. Retrieved 31 May 2022. <a href="https://www.statlect.com/fundamentals-of-statistics/score-test" target="_blank">https://www.statlect.com/fundamentals-of-statistics/score-test</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></p></li>
<li id="fn:11"><p>Cook, T. D.; DeMets, D. L., eds. (2007). Introduction to Statistical Methods for Clinical Trials. Chapman and Hall. pp. 296–297. ISBN 978-1-58488-027-1. <a href="978-1-58488-027-1" target="_blank">978-1-58488-027-1</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></p></li>
<li id="fn:12"><p>Vandaele, Walter (1981). "Wald, likelihood ratio, and Lagrange multiplier tests as an F test". Economics Letters. 8 (4): 361–365. doi:10.1016/0165-1765(81)90026-4. <a href="/wiki/Economics_Letters" target="_blank">/wiki/Economics_Letters</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></p></li>
</ol>

Score test open-in-new

Score test