Error exponents in hypothesis testing

<h2 id="error-exponents-in-binary-hypothesis-testing">Error exponents in binary hypothesis testing</h2>
<p>Consider a binary hypothesis testing problem in which observations are modeled as <a href="/facts/Independent_and_identically_distributed_random_variables/othIRaWt">independent and identically distributed random variables</a> under each hypothesis. Let 
  
    
      
        
          Y
          
            1
          
        
        ,
        
          Y
          
            2
          
        
        ,
        …
        ,
        
          Y
          
            n
          
        
      
    
    {\displaystyle Y_{1},Y_{2},\ldots ,Y_{n}}
  
 denote the observations. Let 
  
    
      
        
          f
          
            0
          
        
      
    
    {\displaystyle f_{0}}
  
 denote the <a href="/facts/Probability_density_function/zvfybna4">probability density function</a> of each observation 
  
    
      
        
          Y
          
            i
          
        
      
    
    {\displaystyle Y_{i}}
  
 under the null hypothesis 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 and let 
  
    
      
        
          f
          
            1
          
        
      
    
    {\displaystyle f_{1}}
  
 denote the probability density function of each observation 
  
    
      
        
          Y
          
            i
          
        
      
    
    {\displaystyle Y_{i}}
  
 under the alternate hypothesis 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
.
</p><p>In this case there are <a href="/facts/Type_I_and_type_II_errors/aazbF3fq">two possible error events</a>. Error of type 1, also called <a href="/facts/False_positives_and_false_negatives/jkC44hML">false positive</a>, occurs when the null hypothesis is true and it is wrongly rejected. Error of type 2, also called false negative, occurs when the alternate hypothesis is true and null hypothesis is not rejected. The probability of type 1 error is denoted 
  
    
      
        P
        (
        
          e
          r
          r
          o
          r
        
        ∣
        
          H
          
            0
          
        
        )
      
    
    {\displaystyle P(\mathrm {error} \mid H_{0})}
  
 and the probability of type 2 error is denoted 
  
    
      
        P
        (
        
          e
          r
          r
          o
          r
        
        ∣
        
          H
          
            1
          
        
        )
      
    
    {\displaystyle P(\mathrm {error} \mid H_{1})}
  
.
</p>
<h3>Optimal error exponent for Neyman–Pearson testing</h3>
<p>In the Neyman–Pearson<a class="footnote-ref" id="fnref:1" href="#fn:1"><sup>1</sup></a> version of binary hypothesis testing, one is interested in minimizing the probability of type 2 error 
  
    
      
        P
        (
        
          error
        
        ∣
        
          H
          
            1
          
        
        )
      
    
    {\displaystyle P({\text{error}}\mid H_{1})}
  
 subject to the constraint that the probability of type 1 error 
  
    
      
        P
        (
        
          error
        
        ∣
        
          H
          
            0
          
        
        )
      
    
    {\displaystyle P({\text{error}}\mid H_{0})}
  
 is less than or equal to a pre-specified level 
  
    
      
        α
      
    
    {\displaystyle \alpha }
  
. In this setting, the optimal testing procedure is a <a href="/facts/Likelihood-ratio_test/oFqLh3R0">likelihood-ratio test</a>.<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a> Furthermore, the optimal test guarantees that the type 2 error probability decays exponentially in the sample size 
  
    
      
        n
      
    
    {\displaystyle n}
  
 according to 
  
    
      
        
          lim
          
            n
            →
            ∞
          
        
        
          
            
              −
              ln
              ⁡
              P
              (
              
                e
                r
                r
                o
                r
              
              ∣
              
                H
                
                  1
                
              
              )
            
            n
          
        
        =
        D
        (
        
          f
          
            0
          
        
        ∥
        
          f
          
            1
          
        
        )
      
    
    {\displaystyle \lim _{n\to \infty }{\frac {-\ln P(\mathrm {error} \mid H_{1})}{n}}=D(f_{0}\parallel f_{1})}
  
.<a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a> The error exponent 
  
    
      
        D
        (
        
          f
          
            0
          
        
        ∥
        
          f
          
            1
          
        
        )
      
    
    {\displaystyle D(f_{0}\parallel f_{1})}
  
 is the <a href="/facts/Kullback%25E2%2580%2593Leibler_divergence/nh7SjlPE">Kullback–Leibler divergence</a> between the probability distributions of the observations under the two hypotheses. This exponent is also referred to as the Chernoff–Stein lemma exponent.
</p>
<h3>Optimal error exponent for average error probability in Bayesian hypothesis testing</h3>
<p>In the <a href="/facts/Bayesian/mdRsahPx">Bayesian</a> version of binary hypothesis testing one is interested in minimizing the average error probability under both hypothesis, assuming a prior probability of occurrence on each hypothesis. Let 
  
    
      
        
          π
          
            0
          
        
      
    
    {\displaystyle \pi _{0}}
  
 denote the prior probability of hypothesis 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
. In this case the average error probability is given by 
  
    
      
        
          P
          
            ave
          
        
        =
        
          π
          
            0
          
        
        P
        (
        
          error
        
        ∣
        
          H
          
            0
          
        
        )
        +
        (
        1
        −
        
          π
          
            0
          
        
        )
        P
        (
        
          error
        
        ∣
        
          H
          
            1
          
        
        )
      
    
    {\displaystyle P_{\text{ave}}=\pi _{0}P({\text{error}}\mid H_{0})+(1-\pi _{0})P({\text{error}}\mid H_{1})}
  
. In this setting again a likelihood ratio test is optimal and the optimal error decays as 
  
    
      
        
          lim
          
            n
            →
            ∞
          
        
        
          
            
              −
              ln
              ⁡
              
                P
                
                  ave
                
              
            
            n
          
        
        =
        C
        (
        
          f
          
            0
          
        
        ,
        
          f
          
            1
          
        
        )
      
    
    {\displaystyle \lim _{n\to \infty }{\frac {-\ln P_{\text{ave}}}{n}}=C(f_{0},f_{1})}
  
 where 
  
    
      
        C
        (
        
          f
          
            0
          
        
        ,
        
          f
          
            1
          
        
        )
      
    
    {\displaystyle C(f_{0},f_{1})}
  
 represents the Chernoff-information between the two distributions defined as 
  
    
      
        C
        (
        
          f
          
            0
          
        
        ,
        
          f
          
            1
          
        
        )
        =
        
          max
          
            λ
            ∈
            [
            0
            ,
            1
            ]
          
        
        
          [
          
            −
            ln
            ⁡
            ∫
            (
            
              f
              
                0
              
            
            (
            x
            )
            
              )
              
                λ
              
            
            (
            
              f
              
                1
              
            
            (
            x
            )
            
              )
              
                (
                1
                −
                λ
                )
              
            
            
            d
            x
          
          ]
        
      
    
    {\displaystyle C(f_{0},f_{1})=\max _{\lambda \in [0,1]}\left[-\ln \int (f_{0}(x))^{\lambda }(f_{1}(x))^{(1-\lambda )}\,dx\right]}
  
.<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Neyman, J.; Pearson, E. S. (1933), "On the problem of the most efficient tests of statistical hypotheses" (PDF), Philosophical Transactions of the Royal Society of London A, 231 (694–706): 289–337, Bibcode:1933RSPTA.231..289N, doi:10.1098/rsta.1933.0009, JSTOR 91247 <a href="/wiki/Jerzy_Neyman" target="_blank">/wiki/Jerzy_Neyman</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Lehmann, E. L.; Romano, Joseph P. (2005). Testing Statistical Hypotheses (3 ed.). New York: Springer. ISBN 978-0-387-98864-1. <a href="978-0-387-98864-1" target="_blank">978-0-387-98864-1</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). New York: Wiley-Interscience. <a href="/wiki/Thomas_M._Cover" target="_blank">/wiki/Thomas_M._Cover</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). New York: Wiley-Interscience. <a href="/wiki/Thomas_M._Cover" target="_blank">/wiki/Thomas_M._Cover</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
</ol>

Error exponents in hypothesis testing open-in-new

Error exponents in hypothesis testing