Consistent estimator

<h2 id="definition">Definition</h2>
Formally speaking, an <a href="/facts/Estimator/CbkjcKvN">estimator</a> Tn of parameter θ is said to be weakly consistent, if it <a href="/facts/Convergence_in_probability/pWdatFlY">converges in probability</a> to the true value of the parameter:<a class="footnote-ref" id="fnref:1" href="#fn:1">1</a>

plim
            
              n
              →
              ∞
            
          
        
        
        
          T
          
            n
          
        
        =
        θ
        .
      
    
    {\displaystyle {\underset {n\to \infty }{\operatorname {plim} }}\;T_{n}=\theta .}

i.e. if, for all ε > 0

lim
          
            n
            →
            ∞
          
        
        Pr
        
          
            (
          
        
        
          |
        
        
          T
          
            n
          
        
        −
        θ
        
          |
        
        >
        ε
        
          
            )
          
        
        =
        0.
      
    
    {\displaystyle \lim _{n\to \infty }\Pr {\big (}|T_{n}-\theta |>\varepsilon {\big )}=0.}

An <a href="/facts/Estimator/CbkjcKvN">estimator</a> Tn of parameter θ is said to be strongly consistent, if it converges almost surely to the true value of the parameter:

Pr
        
          
            (
          
        
        
          lim
          
            n
            →
            ∞
          
        
        
          T
          
            n
          
        
        =
        θ
        
          
            )
          
        
        =
        1.
      
    
    {\displaystyle \Pr {\big (}\lim _{n\to \infty }T_{n}=\theta {\big )}=1.}

A more rigorous definition takes into account the fact that θ is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose {pθ: θ ∈ Θ} is a family of distributions (the <a href="/facts/Parametric_model/FeynAa0f">parametric model</a>), and Xθ = {X1, X2, … : Xi ~ pθ} is an infinite <a href="/facts/Statistical_sample/URCwKNge">sample</a> from the distribution pθ. Let { Tn(Xθ) } be a sequence of estimators for some parameter g(θ). Usually, Tn will be based on the first n observations of a sample. Then this sequence {Tn} is said to be (weakly) consistent if <a class="footnote-ref" id="fnref:2" href="#fn:2">2</a>

plim
            
              n
              →
              ∞
            
          
        
        
        
          T
          
            n
          
        
        (
        
          X
          
            θ
          
        
        )
        =
        g
        (
        θ
        )
        ,
         
         
        
          for all
        
         
        θ
        ∈
        Θ
        .
      
    
    {\displaystyle {\underset {n\to \infty }{\operatorname {plim} }}\;T_{n}(X^{\theta })=g(\theta ),\ \ {\text{for all}}\ \theta \in \Theta .}

This definition uses g(θ) instead of simply θ, because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the location parameter of the model, but not the scale:

<h2 id="examples">Examples</h2>
<h3>Sample mean of a normal random variable</h3>
Suppose one has a sequence of <a href="/facts/Independence_(probability_theory)/NUzQtnUL">statistically independent</a> observations {X1, X2, ...} from a <a href="/facts/Normal_distribution/UapjjPyQ">normal N(μ, σ2)</a> distribution. To estimate μ based on the first n observations, one can use the <a href="/facts/Sample_mean/Ah8VVVDT">sample mean</a>: Tn = (X1 + ... + Xn)/n. This defines a sequence of estimators, indexed by the sample size n.
From the properties of the normal distribution, we know the <a href="/facts/Sampling_distribution/50dJL1ja">sampling distribution</a> of this statistic: Tn is itself normally distributed, with mean μ and variance σ2/n. Equivalently, 
 
 
 
 
 (
 
 T
 
 n
 
 
 −
 μ
 )
 
 /
 
 (
 σ
 
 /
 
 
 
 n
 
 
 )
 
 
 
 {\displaystyle \scriptstyle (T_{n}-\mu )/(\sigma /{\sqrt {n}})}
 
 has a standard normal distribution:

Pr
        
        
          [
          
            
            
              |
            
            
              T
              
                n
              
            
            −
            μ
            
              |
            
            ≥
            ε
            
          
          ]
        
        =
        Pr
        
        
          [
          
            
              
                
                  
                    
                      n
                    
                  
                  
                  
                    
                      |
                    
                  
                  
                    T
                    
                      n
                    
                  
                  −
                  μ
                  
                    
                      |
                    
                  
                
                σ
              
            
            ≥
            
              
                n
              
            
            ε
            
              /
            
            σ
          
          ]
        
        =
        2
        
          (
          
            1
            −
            Φ
            
              (
              
                
                  
                    
                      
                        n
                      
                    
                    
                    ε
                  
                  σ
                
              
              )
            
          
          )
        
        →
        0
      
    
    {\displaystyle \Pr \!\left[\,|T_{n}-\mu |\geq \varepsilon \,\right]=\Pr \!\left[{\frac {{\sqrt {n}}\,{\big |}T_{n}-\mu {\big |}}{\sigma }}\geq {\sqrt {n}}\varepsilon /\sigma \right]=2\left(1-\Phi \left({\frac {{\sqrt {n}}\,\varepsilon }{\sigma }}\right)\right)\to 0}

as n tends to infinity, for any fixed ε > 0. Therefore, the sequence Tn of sample means is consistent for the population mean μ (recalling that 
 
 
 
 Φ
 
 
 {\displaystyle \Phi }
 
 is the <a href="/facts/Cumulative_distribution_function/WaKU8tp4">cumulative distribution</a> of the standard normal distribution).

<h2 id="establishing-consistency">Establishing consistency</h2>
The notion of asymptotic consistency is very close, almost synonymous to the notion of convergence in probability. As such, any theorem, lemma, or property which establishes convergence in probability may be used to prove the consistency. Many such tools exist:

<ul><li>In order to demonstrate consistency directly from the definition one can use the inequality <a class="footnote-ref" id="fnref:3" href="#fn:3">3</a></li></ul>

Pr
        
        
          
            [
          
        
        h
        (
        
          T
          
            n
          
        
        −
        θ
        )
        ≥
        ε
        
          
            ]
          
        
        ≤
        
          
            
              E
              ⁡
              
                
                  [
                
              
              h
              (
              
                T
                
                  n
                
              
              −
              θ
              )
              
                
                  ]
                
              
            
            
              h
              (
              ε
              )
            
          
        
        ,
      
    
    {\displaystyle \Pr \!{\big [}h(T_{n}-\theta )\geq \varepsilon {\big ]}\leq {\frac {\operatorname {E} {\big [}h(T_{n}-\theta ){\big ]}}{h(\varepsilon )}},}

the most common choice for function h being either the absolute value (in which case it is known as <a href="/facts/Markov_inequality/Re85kMQJ">Markov inequality</a>), or the quadratic function (respectively <a href="/facts/Chebyshev%27s_inequality/jf5ReDJv">Chebyshev's inequality</a>).

<ul><li>Another useful result is the <a href="/facts/Continuous_mapping_theorem/YEauUuWS">continuous mapping theorem</a>: if Tn is consistent for θ and g(·) is a real-valued function continuous at the point θ, then g(Tn) will be consistent for g(θ):<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a></li></ul>

T
          
            n
          
        
         
        
          
            →
            
              p
            
          
        
         
        θ
         
        
        ⇒
        
        g
        (
        
          T
          
            n
          
        
        )
         
        
          
            →
            
              p
            
          
        
         
        g
        (
        θ
        )
      
    
    {\displaystyle T_{n}\ {\xrightarrow {p}}\ \theta \ \quad \Rightarrow \quad g(T_{n})\ {\xrightarrow {p}}\ g(\theta )}

<ul><li><a href="/facts/Slutsky%27s_theorem/Fwh5f8Wt">Slutsky's theorem</a> can be used to combine several different estimators, or an estimator with a non-random convergent sequence. If Tn →dα, and Sn →pβ, then <a class="footnote-ref" id="fnref:5" href="#fn:5">5</a></li></ul>

T
                  
                    n
                  
                
                +
                
                  S
                  
                    n
                  
                
                 
                
                  
                    →
                    
                      d
                    
                  
                
                 
                α
                +
                β
                ,
              
            
            
              
              
                
                  T
                  
                    n
                  
                
                
                  S
                  
                    n
                  
                
                 
                
                  
                    →
                    
                      d
                    
                  
                
                 
                α
                β
                ,
              
            
            
              
              
                
                  T
                  
                    n
                  
                
                
                  /
                
                
                  S
                  
                    n
                  
                
                 
                
                  
                    →
                    
                      d
                    
                  
                
                 
                α
                
                  /
                
                β
                ,
                
                   provided that 
                
                β
                ≠
                0
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}&T_{n}+S_{n}\ {\xrightarrow {d}}\ \alpha +\beta ,\\&T_{n}S_{n}\ {\xrightarrow {d}}\ \alpha \beta ,\\&T_{n}/S_{n}\ {\xrightarrow {d}}\ \alpha /\beta ,{\text{ provided that }}\beta \neq 0\end{aligned}}}

<ul><li>If estimator Tn is given by an explicit formula, then most likely the formula will employ sums of random variables, and then the <a href="/facts/Law_of_large_numbers/X3Bjcy3v">law of large numbers</a> can be used: for a sequence {Xn} of random variables and under suitable conditions,</li></ul>

1
            n
          
        
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        g
        (
        
          X
          
            i
          
        
        )
         
        
          
            →
            
              p
            
          
        
         
        E
        ⁡
        [
        
        g
        (
        X
        )
        
        ]
      
    
    {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}g(X_{i})\ {\xrightarrow {p}}\ \operatorname {E} [\,g(X)\,]}

<ul><li>If estimator Tn is defined implicitly, for example as a value that maximizes certain objective function (see <a href="/facts/Extremum_estimator/BuRVcBQc">extremum estimator</a>), then a more complicated argument involving <a href="/facts/Stochastic_equicontinuity/Nhrornn1">stochastic equicontinuity</a> has to be used.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a></li></ul>
<h2 id="bias-versus-consistency">Bias versus consistency</h2>
<h3>Unbiased but not consistent</h3>
An estimator can be <a href="/facts/Biased_estimator/oxIvEgmd">unbiased</a> but not consistent. For example, for an <a href="/facts/Iid/othIRaWt">iid</a> sample {x1,..., xn} one can use Tn(X) = xn as the estimator of the mean E[X]. Note that here the sampling distribution of Tn is the same as the underlying distribution (for any n, as it ignores all points but the last). So E[Tn(X)] = E[X] for any n, hence it is unbiased, but it does not converge to any value.
However, if a sequence of estimators is unbiased and converges to a value, then it is consistent, as it must converge to the correct value.

<h3>Biased but consistent</h3>
Alternatively, an estimator can be biased but consistent. For example, if the mean is estimated by 
 
 
 
 
 
 1
 n
 
 
 ∑
 
 x
 
 i
 
 
 +
 
 
 1
 n
 
 
 
 
 {\displaystyle {1 \over n}\sum x_{i}+{1 \over n}}
 
 it is biased, but as 
 
 
 
 n
 →
 ∞
 
 
 {\displaystyle n\rightarrow \infty }
 
, it approaches the correct value, and so it is consistent.
Important examples include the <a href="/facts/Sample_variance/ULBJKXD1">sample variance</a> and <a href="/facts/Sample_standard_deviation/qSug9BRl">sample standard deviation</a>. Without <a href="/facts/Bessel%27s_correction/VgVzNomA">Bessel's correction</a> (that is, when using the sample size 
 
 
 
 n
 
 
 {\displaystyle n}
 
 instead of the <a href="/facts/Degrees_of_freedom_(statistics)/fvwKCU86">degrees of freedom</a> 
 
 
 
 n
 −
 1
 
 
 {\displaystyle n-1}
 
), these are both negatively biased but consistent estimators. With the correction, the corrected sample variance is unbiased, while the corrected sample standard deviation is still biased, but less so, and both are still consistent: the correction factor converges to 1 as sample size grows.
Here is another example. Let 
 
 
 
 
 T
 
 n
 
 
 
 
 {\displaystyle T_{n}}
 
 be a sequence of estimators for 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
.

Pr
        (
        
          T
          
            n
          
        
        )
        =
        
          
            {
            
              
                
                  1
                  −
                  1
                  
                    /
                  
                  n
                  ,
                
                
                  
                    
                      if 
                    
                  
                  
                  
                    T
                    
                      n
                    
                  
                  =
                  θ
                
              
              
                
                  1
                  
                    /
                  
                  n
                  ,
                
                
                  
                    
                      if 
                    
                  
                  
                  
                    T
                    
                      n
                    
                  
                  =
                  n
                  δ
                  +
                  θ
                
              
            
            
          
        
      
    
    {\displaystyle \Pr(T_{n})={\begin{cases}1-1/n,&{\mbox{if }}\,T_{n}=\theta \\1/n,&{\mbox{if }}\,T_{n}=n\delta +\theta \end{cases}}}

We can see that 
 
 
 
 
 T
 
 n
 
 
 
 
 →
 
 p
 
 
 
 θ
 
 
 {\displaystyle T_{n}{\xrightarrow {p}}\theta }
 
, 
 
 
 
 E
 ⁡
 [
 
 T
 
 n
 
 
 ]
 =
 θ
 +
 δ
 
 
 {\displaystyle \operatorname {E} [T_{n}]=\theta +\delta }
 
, and the bias does not converge to zero.

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Efficient_estimator/wTmgEqte">Efficient estimator</a></li>
<li><a href="/facts/Fisher_consistency/yH7jWoF3">Fisher consistency</a> — alternative, although rarely used concept of consistency for the estimators</li>
<li><a href="/facts/Regression_dilution/yshVFGzE">Regression dilution</a></li>
<li><a href="/facts/Statistical_hypothesis_testing/yv9RPF6U">Statistical hypothesis testing</a></li>
<li><a href="/facts/Instrumental_variables_estimation/cEQ1xwT6">Instrumental variables estimation</a></li></ul>
<h2 id="notes">Notes</h2>

<ul><li><a href="/facts/Takeshi_Amemiya/CVByAXGq">Amemiya, Takeshi</a> (1985). <a href="https://archive.org/details/advancedeconomet00amem">Advanced Econometrics</a>. <a href="/facts/Harvard_University_Press/YluAnq74">Harvard University Press</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-674-00560-0.</li>
<li><a href="/facts/Erich_Leo_Lehmann/hun7hqhM">Lehmann, E. L.</a>; <a href="/facts/George_Casella/CTjcaJCt">Casella, G.</a> (1998). Theory of Point Estimation (2nd ed.). Springer. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-387-98502-6.</li>
<li>Newey, W. K.; <a href="/facts/Daniel_McFadden/VfXVLVot">McFadden, D.</a> (1994). "Chapter 36: Large sample estimation and hypothesis testing". In Robert F. Engle; Daniel L. McFadden (eds.). Handbook of Econometrics. Vol. 4. Elsevier Science. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-444-88766-0. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:29436457">29436457</a>.</li>
<li>Nikulin, M. S. (2001) [1994], <a href="https://www.encyclopediaofmath.org/index.php?title=Consistent_estimator">"Consistent estimator"</a>, <a href="/facts/Encyclopedia_of_Mathematics/WC6mGtPm">Encyclopedia of Mathematics</a>, <a href="/facts/European_Mathematical_Society/B3h7b672">EMS Press</a></li>
<li><a href="/facts/Elliott_Sober/xQm61uC4">Sober, E.</a> (1988), "Likelihood and convergence", <a href="/facts/Philosophy_of_Science/aonsfhTY">Philosophy of Science</a>, 55 (2): 228–237, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1086%2F289429">10.1086/289429</a>.</li></ul>
<h2 id="external-links">External links</h2>
<ul><li><a href="https://www.youtube.com/watch?v=TfuqBxRgRTU&list=PLD15D38DC7AA3B737&index=3#t=32m00m">Econometrics lecture (topic: unbiased vs. consistent)</a> on <a href="/facts/YouTube_video_(identifier)/db276jst">YouTube</a> by <a href="/facts/Mark_Thoma/ThV29HB0">Mark Thoma</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Amemiya 1985, Definition 3.4.2. - Amemiya, Takeshi (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0. <a href="https://archive.org/details/advancedeconomet00amem" target="_blank">https://archive.org/details/advancedeconomet00amem</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Lehman & Casella 1998, p. 332. - Lehmann, E. L.; Casella, G. (1998). Theory of Point Estimation (2nd ed.). Springer. ISBN 0-387-98502-6. <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Amemiya 1985, equation (3.2.5). - Amemiya, Takeshi (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0. <a href="https://archive.org/details/advancedeconomet00amem" target="_blank">https://archive.org/details/advancedeconomet00amem</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Amemiya 1985, Theorem 3.2.6. - Amemiya, Takeshi (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0. <a href="https://archive.org/details/advancedeconomet00amem" target="_blank">https://archive.org/details/advancedeconomet00amem</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Amemiya 1985, Theorem 3.2.7. - Amemiya, Takeshi (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0. <a href="https://archive.org/details/advancedeconomet00amem" target="_blank">https://archive.org/details/advancedeconomet00amem</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Newey & McFadden 1994, Chapter 2. - Newey, W. K.; McFadden, D. (1994). "Chapter 36: Large sample estimation and hypothesis testing". In Robert F. Engle; Daniel L. McFadden (eds.). Handbook of Econometrics. Vol. 4. Elsevier Science. ISBN 0-444-88766-0. S2CID 29436457. <a href="https://api.semanticscholar.org/CorpusID:29436457" target="_blank">https://api.semanticscholar.org/CorpusID:29436457</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
</ol>

Consistent estimator open-in-new

Consistent estimator