Minimax estimator

<h2 id="definition">Definition</h2>
Definition : An estimator 
 
 
 
 
 δ
 
 M
 
 
 :
 
 
 X
 
 
 →
 Θ
 
 
 
 
 {\displaystyle \delta ^{M}:{\mathcal {X}}\rightarrow \Theta \,\!}
 
 is called minimax with respect to a risk function 
 
 
 
 R
 (
 θ
 ,
 δ
 )
 
 
 
 
 {\displaystyle R(\theta ,\delta )\,\!}
 
 if it achieves the smallest maximum risk among all estimators, satisfying

sup
          
            θ
            ∈
            Θ
          
        
        R
        (
        θ
        ,
        
          δ
          
            M
          
        
        )
        =
        
          inf
          
            δ
          
        
        
          sup
          
            θ
            ∈
            Θ
          
        
        R
        (
        θ
        ,
        δ
        )
        .
        
      
    
    {\displaystyle \sup _{\theta \in \Theta }R(\theta ,\delta ^{M})=\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).\,}

<h2 id="problem-setup">Problem setup</h2>
An example is the problem of estimating a deterministic (not <a href="/facts/Bayes_estimator/BpFRxYBg">Bayesian</a>) parameter 
 
 
 
 θ
 ∈
 Θ
 
 
 {\displaystyle \theta \in \Theta }
 
 from noisy or corrupt data 
 
 
 
 x
 ∈
 
 
 X
 
 
 
 
 {\displaystyle x\in {\mathcal {X}}}
 
 related through the <a href="/facts/Conditional_probability_distribution/0eGm3P9W">conditional probability distribution</a> 
 
 
 
 P
 (
 x
 ∣
 θ
 )
 
 
 
 
 {\displaystyle P(x\mid \theta )\,\!}
 
. The goal is to find a "good" estimator 
 
 
 
 δ
 (
 x
 )
 
 
 
 
 {\displaystyle \delta (x)\,\!}
 
 for estimating the parameter 
 
 
 
 θ
 
 
 
 
 {\displaystyle \theta \,\!}
 
, which minimizes some given <a href="/facts/Risk_function/xv5ozuhl">risk function</a> 
 
 
 
 R
 (
 θ
 ,
 δ
 )
 
 
 
 
 {\displaystyle R(\theta ,\delta )\,\!}
 
. The risk function (technically a <a href="/facts/Functional_(mathematics)/3Cp3PHBR">Functional</a> or <a href="/facts/Operator_(mathematics)/1VWgEdjS">Operator</a> since 
 
 
 
 R
 
 
 {\displaystyle R}
 
 is a function of a function, not function composition) is the <a href="/facts/Expected_value/1XV0JKL8">expectation</a> of some <a href="/facts/Loss_function/xv5ozuhl">loss function</a> 
 
 
 
 L
 (
 θ
 ,
 δ
 )
 
 
 
 
 {\displaystyle L(\theta ,\delta )\,\!}
 
 with respect to 
 
 
 
 P
 (
 x
 ∣
 θ
 )
 
 
 
 
 {\displaystyle P(x\mid \theta )\,\!}
 
. A popular example for a loss function<a class="footnote-ref" id="fnref:1" href="#fn:1">1</a> is the squared error loss 
 
 
 
 L
 (
 θ
 ,
 δ
 )
 =
 ‖
 θ
 −
 δ
 
 ‖
 
 2
 
 
 
 
 
 
 {\displaystyle L(\theta ,\delta )=\|\theta -\delta \|^{2}\,\!}
 
, and the risk function for this loss is the <a href="/facts/Mean_squared_error/kz3TR7bv">mean squared error</a> (MSE).
In general, the risk cannot be minimized because it depends on the unknown parameter 
 
 
 
 θ
 
 
 
 
 {\displaystyle \theta \,\!}
 
 itself, and if the actual value of 
 
 
 
 θ
 
 
 
 
 {\displaystyle \theta \,\!}
 
 were known, there would be no need to estimate it. Therefore, additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criterion.

<h2 id="least-favorable-distribution">Least favorable distribution</h2>
Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a <a href="/facts/Bayes_estimator/BpFRxYBg">Bayes estimator</a> with respect to a least favorable <a href="/facts/Prior_probability/JQKAD4o0">prior distribution</a> of 
 
 
 
 θ
 
 
 
 
 {\displaystyle \theta \,\!}
 
. To demonstrate this notion denote the average risk of the Bayes estimator 
 
 
 
 
 δ
 
 π
 
 
 
 
 
 
 {\displaystyle \delta _{\pi }\,\!}
 
 with respect to a prior distribution 
 
 
 
 π
 
 
 
 
 {\displaystyle \pi \,\!}
 
 as

r
          
            π
          
        
        =
        ∫
        R
        (
        θ
        ,
        
          δ
          
            π
          
        
        )
        
        d
        π
        (
        θ
        )
        
      
    
    {\displaystyle r_{\pi }=\int R(\theta ,\delta _{\pi })\,d\pi (\theta )\,}

Definition: A prior distribution 
 
 
 
 π
 
 
 
 
 {\displaystyle \pi \,\!}
 
 is called least favorable if for every other distribution 
 
 
 
 
 π
 ′
 
 
 
 
 
 {\displaystyle \pi '\,\!}
 
 the average risk satisfies 
 
 
 
 
 r
 
 π
 
 
 ≥
 
 r
 
 
 π
 ′
 
 
 
 
 
 
 {\displaystyle r_{\pi }\geq r_{\pi '}\,}
 
.
Theorem 1: If 
 
 
 
 
 r
 
 π
 
 
 =
 
 sup
 
 θ
 
 
 R
 (
 θ
 ,
 
 δ
 
 π
 
 
 )
 ,
 
 
 
 {\displaystyle r_{\pi }=\sup _{\theta }R(\theta ,\delta _{\pi }),\,}
 
 then:

<ol><li>
 
 
 
 
 δ
 
 π
 
 
 
 
 
 
 {\displaystyle \delta _{\pi }\,\!}
 
 is minimax.</li>
<li>If 
 
 
 
 
 δ
 
 π
 
 
 
 
 
 
 {\displaystyle \delta _{\pi }\,\!}
 
 is a unique Bayes estimator, it is also the unique minimax estimator.</li>
<li>
 
 
 
 π
 
 
 
 
 {\displaystyle \pi \,\!}
 
 is least favorable.</li></ol>
Corollary: If a Bayes estimator has constant risk, it is minimax. This is not a necessary condition.
Example 1: Unfair coin<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a><a class="footnote-ref" id="fnref:3" href="#fn:3">3</a>: The example is the problem of estimating the "success" rate of a <a href="/facts/Binomial_distribution/UMoFMjDj">binomial</a> variable, 
 
 
 
 x
 ∼
 B
 (
 n
 ,
 θ
 )
 
 
 
 
 {\displaystyle x\sim B(n,\theta )\,\!}
 
. This may be viewed as estimating the rate at which an <a href="/facts/Fair_coin/vTBk29k7">unfair coin</a> falls on "heads" or "tails". In this case the Bayes estimator with respect to a <a href="/facts/Beta_distribution/DbJk4eeV">Beta</a>-distributed prior, 
 
 
 
 θ
 ∼
 
 Beta
 
 (
 
 
 n
 
 
 
 /
 
 2
 ,
 
 
 n
 
 
 
 /
 
 2
 )
 
 
 
 {\displaystyle \theta \sim {\text{Beta}}({\sqrt {n}}/2,{\sqrt {n}}/2)\,}
 
 is

δ
          
            M
          
        
        =
        
          
            
              x
              +
              0.5
              
                
                  n
                
              
            
            
              n
              +
              
                
                  n
                
              
            
          
        
        ,
        
      
    
    {\displaystyle \delta ^{M}={\frac {x+0.5{\sqrt {n}}}{n+{\sqrt {n}}}},\,}

with constant Bayes risk

r
        =
        
          
            1
            
              4
              (
              1
              +
              
                
                  n
                
              
              
                )
                
                  2
                
              
            
          
        
        
      
    
    {\displaystyle r={\frac {1}{4(1+{\sqrt {n}})^{2}}}\,}

and, according to the Corollary, is minimax.
Definition: A sequence of prior distributions 
 
 
 
 
 π
 
 n
 
 
 
 
 
 
 {\displaystyle \pi _{n}\,\!}
 
 is called least favorable if for any other distribution 
 
 
 
 
 π
 ′
 
 
 
 
 
 {\displaystyle \pi '\,\!}
 
,

lim
          
            n
            →
            ∞
          
        
        
          r
          
            
              π
              
                n
              
            
          
        
        ≥
        
          r
          
            
              π
              ′
            
          
        
        .
        
      
    
    {\displaystyle \lim _{n\rightarrow \infty }r_{\pi _{n}}\geq r_{\pi '}.\,}

Theorem 2: If there are a sequence of priors 
 
 
 
 
 π
 
 n
 
 
 
 
 
 
 {\displaystyle \pi _{n}\,\!}
 
 and an estimator 
 
 
 
 δ
 
 
 
 
 {\displaystyle \delta \,\!}
 
 such that

sup
 
 θ
 
 
 R
 (
 θ
 ,
 δ
 )
 =
 
 lim
 
 n
 →
 ∞
 
 
 
 r
 
 
 π
 
 n
 
 
 
 
 
 
 
 
 {\displaystyle \sup _{\theta }R(\theta ,\delta )=\lim _{n\rightarrow \infty }r_{\pi _{n}}\,\!}
 
, then:

<ol><li>
 
 
 
 δ
 
 
 
 
 {\displaystyle \delta \,\!}
 
 is minimax.</li>
<li>The sequence 
 
 
 
 
 π
 
 n
 
 
 
 
 
 
 {\displaystyle \pi _{n}\,\!}
 
 is least favorable.</li></ol>

No uniqueness is guaranteed. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a <a href="/facts/Uniform_distribution_(continuous)/XbnlVljT">uniform</a> prior, 
 
 
 
 
 π
 
 n
 
 
 ∼
 U
 [
 −
 n
 ,
 n
 ]
 
 
 
 
 {\displaystyle \pi _{n}\sim U[-n,n]\,\!}
 
 with increasing support and also with respect to a zero-mean normal prior 
 
 
 
 
 π
 
 n
 
 
 ∼
 N
 (
 0
 ,
 n
 
 σ
 
 2
 
 
 )
 
 
 
 
 {\displaystyle \pi _{n}\sim N(0,n\sigma ^{2})\,\!}
 
 with increasing variance. Neither the resulting ML estimator is unique minimax, nor the least favorable prior is unique.
Example 2: the problem of estimating the mean of 
 
 
 
 p
 
 
 
 
 {\displaystyle p\,\!}
 
 dimensional <a href="/facts/Normal_distribution/UapjjPyQ">Gaussian</a> random vector, 
 
 
 
 x
 ∼
 N
 (
 θ
 ,
 
 I
 
 p
 
 
 
 σ
 
 2
 
 
 )
 
 
 
 
 {\displaystyle x\sim N(\theta ,I_{p}\sigma ^{2})\,\!}
 
. The <a href="/facts/Maximum_likelihood/0Yq2dpQD">maximum likelihood</a> (ML) estimator for 
 
 
 
 θ
 
 
 
 
 {\displaystyle \theta \,\!}
 
 in this case is 
 
 
 
 
 δ
 
 ML
 
 
 =
 x
 
 
 
 
 {\displaystyle \delta _{\text{ML}}=x\,\!}
 
, and its risk is

R
        (
        θ
        ,
        
          δ
          
            ML
          
        
        )
        =
        E
        
          ‖
          
            δ
            
              M
              L
            
          
          −
          θ
          
            ‖
            
              2
            
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            p
          
        
        E
        (
        
          x
          
            i
          
        
        −
        
          θ
          
            i
          
        
        
          )
          
            2
          
        
        =
        p
        
          σ
          
            2
          
        
        .
        
      
    
    {\displaystyle R(\theta ,\delta _{\text{ML}})=E{\|\delta _{ML}-\theta \|^{2}}=\sum _{i=1}^{p}E(x_{i}-\theta _{i})^{2}=p\sigma ^{2}.\,}

The risk is constant, but the ML estimator is not a Bayes estimator, and the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence 
 
 
 
 
 π
 
 n
 
 
 ∼
 N
 (
 0
 ,
 n
 
 σ
 
 2
 
 
 )
 
 
 
 
 {\displaystyle \pi _{n}\sim N(0,n\sigma ^{2})\,\!}
 
 and hence, minimax according to Theorem 2. Minimaxity does not always imply <a href="/facts/Admissible_decision_rule/Kmw4R8Gz">admissibility</a>. In this example, the ML estimator is known to be inadmissible (not admissible) whenever 
 
 
 
 p
 >
 2
 
 
 
 
 {\displaystyle p>2\,\!}
 
. The <a href="/facts/James%25E2%2580%2593Stein_estimator/o7tcLXmt">James–Stein estimator</a> dominates the ML whenever 
 
 
 
 p
 >
 2
 
 
 
 
 {\displaystyle p>2\,\!}
 
. Though both estimators have the same risk 
 
 
 
 p
 
 σ
 
 2
 
 
 
 
 
 
 {\displaystyle p\sigma ^{2}\,\!}
 
 when 
 
 
 
 ‖
 θ
 ‖
 →
 ∞
 
 
 
 
 {\displaystyle \|\theta \|\rightarrow \infty \,\!}
 
, and they are both minimax, the James–Stein estimator has smaller risk for any finite 
 
 
 
 ‖
 θ
 ‖
 
 
 
 
 {\displaystyle \|\theta \|\,\!}
 
.

<h2 id="examples">Examples</h2>
While in general, it is difficult, often impossible to determine the minimax estimator, in many cases, a minimax estimator has been determined.
Example 3: Bounded normal mean: When estimating the mean of a normal vector 
 
 
 
 x
 ∼
 N
 (
 θ
 ,
 
 I
 
 n
 
 
 
 σ
 
 2
 
 
 )
 
 
 
 
 {\displaystyle x\sim N(\theta ,I_{n}\sigma ^{2})\,\!}
 
, where it is known that 
 
 
 
 ‖
 θ
 
 ‖
 
 2
 
 
 ≤
 M
 
 
 
 
 {\displaystyle \|\theta \|^{2}\leq M\,\!}
 
. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding <a href="/facts/Sphere/4Ivb1cTj">sphere</a> is known to be minimax whenever 
 
 
 
 M
 ≤
 n
 
 
 
 
 {\displaystyle M\leq n\,\!}
 
. The analytical expression for this estimator is

δ
          
            M
          
        
        (
        x
        )
        =
        
          
            
              M
              
                J
                
                  n
                  +
                  1
                
              
              (
              M
              ‖
              x
              ‖
              )
            
            
              ‖
              x
              ‖
              
                J
                
                  n
                
              
              (
              M
              ‖
              x
              ‖
              )
            
          
        
        x
        ,
        
      
    
    {\displaystyle \delta ^{M}(x)={\frac {MJ_{n+1}(M\|x\|)}{\|x\|J_{n}(M\|x\|)}}x,\,}

where 
 
 
 
 
 J
 
 n
 
 
 (
 t
 )
 
 
 
 
 {\displaystyle J_{n}(t)\,\!}
 
, is the modified <a href="/facts/Bessel_function/hSyvFPqC">Bessel function</a> of the first kind of order n.

<h2 id="asymptotic-minimax-estimator">Asymptotic minimax estimator</h2>
The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax – an estimator 
 
 
 
 
 δ
 ′
 
 
 
 {\displaystyle \delta '}
 
 is called 
 
 
 
 c
 
 
 {\displaystyle c}
 
-asymptotic (or approximate) minimax if

sup
          
            θ
            ∈
            Θ
          
        
        R
        (
        θ
        ,
        
          δ
          ′
        
        )
        ≤
        c
        
          inf
          
            δ
          
        
        
          sup
          
            θ
            ∈
            Θ
          
        
        R
        (
        θ
        ,
        δ
        )
        .
      
    
    {\displaystyle \sup _{\theta \in \Theta }R(\theta ,\delta ')\leq c\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).}

For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of the approximate minimax estimator is intimately related to the geometry, such as the <a href="/facts/Measure-preserving_dynamical_system/acIdlpqt">metric entropy number</a>, of 
 
 
 
 Θ
 
 
 {\displaystyle \Theta }
 
.

<h2 id="randomized-minimax-estimator">Randomized minimax estimator</h2>

Sometimes, a minimax estimator may take the form of a <a href="/facts/Randomized_decision_rule/GjkPYSX9">randomized decision rule</a>. The parameter space has two elements and each point on the graph corresponds to the risk of a decision rule: the x-coordinate is the risk when the parameter is 
 
 
 
 
 θ
 
 1
 
 
 
 
 {\displaystyle \theta _{1}}
 
 and the y-coordinate is the risk when the parameter is 
 
 
 
 
 θ
 
 2
 
 
 
 
 {\displaystyle \theta _{2}}
 
. In this decision problem, the minimax estimator lies on a line segment connecting two deterministic estimators. Choosing 
 
 
 
 
 δ
 
 1
 
 
 
 
 {\displaystyle \delta _{1}}
 
 with probability 
 
 
 
 1
 −
 p
 
 
 {\displaystyle 1-p}
 
 and 
 
 
 
 
 δ
 
 2
 
 
 
 
 {\displaystyle \delta _{2}}
 
 with probability 
 
 
 
 p
 
 
 {\displaystyle p}
 
 minimises the supremum risk.

<h2 id="relationship-to-robust-optimization">Relationship to robust optimization</h2>
<a href="/facts/Robust_optimization/KvYDuUuh">Robust optimization</a> is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters.<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a><a class="footnote-ref" id="fnref:5" href="#fn:5">5</a> For instance, the <a href="/facts/Minimum_mean_square_error/hI2lqEMh">MMSE Bayesian estimation</a> of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>
In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in <a href="/facts/Bayesian_statistics/9w7b1Bw4">Bayesian statistics</a>.

<ul><li><a href="/facts/Erich_Leo_Lehmann/hun7hqhM">E. L. Lehmann</a> and <a href="/facts/George_Casella/CTjcaJCt">G. Casella</a> (1998), Theory of Point Estimation, 2nd ed. New York: Springer-Verlag.</li>
<li>F. Perron and E. Marchand (2002), "On the minimax estimator of a bounded normal mean," Statistics and Probability Letters 58: 327–333.</li>
<li>R. Fandom Noubiap and <a href="/facts/Wladimir_Seidel/8h22Kqgw">W. Seidel</a> (2001), "An Algorithm for Calculating Gamma-Minimax Decision Rules under Generalized Moment Conditions," Annals of Statistics, August, 2001, vol. 29, no. 4, pp. 1094–1116</li>
<li><a href="/facts/Charles_Stein_(statistician)/zoz8RXWn">Stein, C.</a> (1981). <a href="https://doi.org/10.1214%2Faos%2F1176345632">"Estimation of the mean of a multivariate normal distribution"</a>. <a href="/facts/Annals_of_Statistics/frNL5W9E">Annals of Statistics</a>. 9 (6): 1135–1151. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1214%2Faos%2F1176345632">10.1214/aos/1176345632</a>. <a href="/facts/MR_(identifier)/uP137L11">MR</a> <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=0630098">0630098</a>. <a href="/facts/Zbl_(identifier)/P6rFxKKx">Zbl</a> <a href="https://zbmath.org/?format=complete&q=an:0476.62035">0476.62035</a>.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis (2 ed.). New York: Springer-Verlag. pp. xv+425. ISBN 0-387-96098-8. MR 0580664. <a href="0-387-96098-8" target="_blank">0-387-96098-8</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Hodges, Jr., J.L.; Lehmann, E.L. (1950). "Some problems in minimax point estimation". Ann. Math. Statist. 21 (2): 182–197. doi:10.1214/aoms/1177729838. JSTOR 2236900. MR 0035949. Zbl 0038.09802. <a href="/wiki/Joseph_Lawson_Hodges_Jr." target="_blank">/wiki/Joseph_Lawson_Hodges_Jr.</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Steinhaus, Hugon (1957). "The problem of estimation". Ann. Math. Statist. 28 (3): 633–648. doi:10.1214/aoms/1177706876. JSTOR 2237224. MR 0092313. Zbl 0088.35503. <a href="/wiki/Hugo_Steinhaus" target="_blank">/wiki/Hugo_Steinhaus</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">S. A. Kassam and H. V. Poor (1985), "Robust Techniques for Signal Processing: A Survey," Proceedings of the IEEE, vol. 73, pp. 433–481, March 1985. <a href="/wiki/Vincent_Poor" target="_blank">/wiki/Vincent_Poor</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">A. Ben-Tal, L. El Ghaoui, and A. Nemirovski (2009), "Robust Optimization", Princeton University Press, 2009. <a href="/wiki/Arkadi_Nemirovski" target="_blank">/wiki/Arkadi_Nemirovski</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">S. Verdu and H. V. Poor (1984), "On Minimax Robustness: A general approach and applications," IEEE Transactions on Information Theory, vol. 30, pp. 328–340, March 1984. <a href="/wiki/Sergio_Verd%C3%BA" target="_blank">/wiki/Sergio_Verd%C3%BA</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">M. Danish Nisar. Minimax Robustness in Signal Processing for Communications, Shaker Verlag, ISBN 978-3-8440-0332-1, August 2011. <a href="http://www.shaker.eu/shop/978-3-8440-0332-1" target="_blank">http://www.shaker.eu/shop/978-3-8440-0332-1</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
</ol>

Minimax estimator open-in-new

Minimax estimator