Control variates

<h2 id="underlying-principle">Underlying principle</h2>
<p>Let the unknown <a href="/facts/Parameter/zFvVFMv9">parameter</a> of interest be 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
, and assume we have a <a href="/facts/Statistic/av21WaMK">statistic</a> 
  
    
      
        m
      
    
    {\displaystyle m}
  
 such that the <a href="/facts/Expected_value/1XV0JKL8">expected value</a> of <i>m</i> is μ: 
  
    
      
        
          E
        
        
          [
          m
          ]
        
        =
        μ
      
    
    {\displaystyle \mathbb {E} \left[m\right]=\mu }
  
, i.e. <i>m</i> is an <a href="/facts/Bias_of_an_estimator/oxIvEgmd">unbiased estimator</a> for μ. Suppose we calculate another statistic 
  
    
      
        t
      
    
    {\displaystyle t}
  
 such that 
  
    
      
        
          E
        
        
          [
          t
          ]
        
        =
        τ
      
    
    {\displaystyle \mathbb {E} \left[t\right]=\tau }
  
 is a known value. Then
</p>

m
          
            ⋆
          
        
        =
        m
        +
        c
        
          (
          
            t
            −
            τ
          
          )
        
        
      
    
    {\displaystyle m^{\star }=m+c\left(t-\tau \right)\,}

<p>is also an unbiased estimator for 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 for any choice of the coefficient 
  
    
      
        c
      
    
    {\displaystyle c}
  
. 
The <a href="/facts/Variance/ULBJKXD1">variance</a> of the resulting estimator 
  
    
      
        
          m
          
            ⋆
          
        
      
    
    {\displaystyle m^{\star }}
  
 is
</p>

Var
          
        
        
          (
          
            m
            
              ⋆
            
          
          )
        
        =
        
          
            Var
          
        
        
          (
          m
          )
        
        +
        
          c
          
            2
          
        
        
        
          
            Var
          
        
        
          (
          t
          )
        
        +
        2
        c
        
        
          
            Cov
          
        
        
          (
          
            m
            ,
            t
          
          )
        
        .
      
    
    {\displaystyle {\textrm {Var}}\left(m^{\star }\right)={\textrm {Var}}\left(m\right)+c^{2}\,{\textrm {Var}}\left(t\right)+2c\,{\textrm {Cov}}\left(m,t\right).}

<p>By differentiating the above expression with respect to 
  
    
      
        c
      
    
    {\displaystyle c}
  
, it can be shown that choosing the optimal coefficient
</p>

c
          
            ⋆
          
        
        =
        −
        
          
            
              
                
                  Cov
                
              
              
                (
                
                  m
                  ,
                  t
                
                )
              
            
            
              
                
                  Var
                
              
              
                (
                t
                )
              
            
          
        
      
    
    {\displaystyle c^{\star }=-{\frac {{\textrm {Cov}}\left(m,t\right)}{{\textrm {Var}}\left(t\right)}}}

<p>minimizes the variance of 
  
    
      
        
          m
          
            ⋆
          
        
      
    
    {\displaystyle m^{\star }}
  
. (Note that this coefficient is the same as the coefficient obtained from a <a href="/facts/Linear_regression/5n998IhK">linear regression</a>.) With this choice,
</p>

Var
                  
                
                
                  (
                  
                    m
                    
                      ⋆
                    
                  
                  )
                
              
              
                
                =
                
                  
                    Var
                  
                
                
                  (
                  m
                  )
                
                −
                
                  
                    
                      
                        [
                        
                          
                            
                              Cov
                            
                          
                          
                            (
                            
                              m
                              ,
                              t
                            
                            )
                          
                        
                        ]
                      
                      
                        2
                      
                    
                    
                      
                        
                          Var
                        
                      
                      
                        (
                        t
                        )
                      
                    
                  
                
              
            
            
              
              
                
                =
                
                  (
                  
                    1
                    −
                    
                      ρ
                      
                        m
                        ,
                        t
                      
                      
                        2
                      
                    
                  
                  )
                
                
                  
                    Var
                  
                
                
                  (
                  m
                  )
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\textrm {Var}}\left(m^{\star }\right)&={\textrm {Var}}\left(m\right)-{\frac {\left[{\textrm {Cov}}\left(m,t\right)\right]^{2}}{{\textrm {Var}}\left(t\right)}}\\&=\left(1-\rho _{m,t}^{2}\right){\textrm {Var}}\left(m\right)\end{aligned}}}

<p>where
</p>

ρ
          
            m
            ,
            t
          
        
        =
        
          
            Corr
          
        
        
          (
          
            m
            ,
            t
          
          )
        
        
      
    
    {\displaystyle \rho _{m,t}={\textrm {Corr}}\left(m,t\right)\,}

<p>is the <a href="/facts/Pearson_product-moment_correlation_coefficient/Igf0xDPc">correlation coefficient</a> of 
  
    
      
        m
      
    
    {\displaystyle m}
  
 and 
  
    
      
        t
      
    
    {\displaystyle t}
  
. The greater the value of 
  
    
      
        |
        
          ρ
          
            m
            ,
            t
          
        
        |
      
    
    {\displaystyle \vert \rho _{m,t}\vert }
  
, the greater the <a href="/facts/Variance_reduction/EFbuX0tD">variance reduction</a> achieved.
</p><p>In the case that 
  
    
      
        
          
            Cov
          
        
        
          (
          
            m
            ,
            t
          
          )
        
      
    
    {\displaystyle {\textrm {Cov}}\left(m,t\right)}
  
, 
  
    
      
        
          
            Var
          
        
        
          (
          t
          )
        
      
    
    {\displaystyle {\textrm {Var}}\left(t\right)}
  
, and/or 
  
    
      
        
          ρ
          
            m
            ,
            t
          
        
        
      
    
    {\displaystyle \rho _{m,t}\;}
  
 are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain <a href="/facts/Least_squares/50jIwbxC">least squares</a> system; therefore this technique is also known as regression sampling.
</p><p>When the expectation of the control variable, 
  
    
      
        
          E
        
        
          [
          t
          ]
        
        =
        τ
      
    
    {\displaystyle \mathbb {E} \left[t\right]=\tau }
  
, is not known analytically, it is still possible to increase the precision in estimating 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating 
  
    
      
        t
      
    
    {\displaystyle t}
  
 is significantly cheaper than computing 
  
    
      
        m
      
    
    {\displaystyle m}
  
; 2) the magnitude of the correlation coefficient 
  
    
      
        
          |
        
        
          ρ
          
            m
            ,
            t
          
        
        
          |
        
      
    
    {\displaystyle |\rho _{m,t}|}
  
 is close to unity. <a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p>
<h2 id="example">Example</h2>
<p>We would like to estimate
</p>

I
        =
        
          ∫
          
            0
          
          
            1
          
        
        
          
            1
            
              1
              +
              x
            
          
        
        
        
          d
        
        x
      
    
    {\displaystyle I=\int _{0}^{1}{\frac {1}{1+x}}\,\mathrm {d} x}

<p>using <a href="/facts/Monte_Carlo_integration/YefNEuUg">Monte Carlo integration</a>. This integral is the expected value of  
  
    
      
        f
        (
        U
        )
      
    
    {\displaystyle f(U)}
  
,  where
</p>

f
        (
        U
        )
        =
        
          
            1
            
              1
              +
              U
            
          
        
      
    
    {\displaystyle f(U)={\frac {1}{1+U}}}

<p>and <i>U</i> follows a <a href="/facts/Uniform_distribution_(continuous)/XbnlVljT">uniform distribution</a> [0, 1].
Using a sample of size <i>n</i> denote the points in the sample as 
  
    
      
        
          u
          
            1
          
        
        ,
        ⋯
        ,
        
          u
          
            n
          
        
      
    
    {\displaystyle u_{1},\cdots ,u_{n}}
  
. Then the estimate is given by
</p>

I
        ≈
        
          
            1
            n
          
        
        
          ∑
          
            i
          
        
        f
        (
        
          u
          
            i
          
        
        )
        .
      
    
    {\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i}).}

<p>Now we introduce 
  
    
      
        g
        (
        U
        )
        =
        1
        +
        U
      
    
    {\displaystyle g(U)=1+U}
  
 as a control variate with a known expected value 
  
    
      
        
          E
        
        
          [
          
            g
            
              (
              U
              )
            
          
          ]
        
        =
        
          ∫
          
            0
          
          
            1
          
        
        (
        1
        +
        x
        )
        
        
          d
        
        x
        =
        
          
            
              3
              2
            
          
        
      
    
    {\displaystyle \mathbb {E} \left[g\left(U\right)\right]=\int _{0}^{1}(1+x)\,\mathrm {d} x={\tfrac {3}{2}}}
  
 and combine the two into a new estimate
</p>

I
        ≈
        
          
            1
            n
          
        
        
          ∑
          
            i
          
        
        f
        (
        
          u
          
            i
          
        
        )
        +
        c
        
          (
          
            
              
                1
                n
              
            
            
              ∑
              
                i
              
            
            g
            (
            
              u
              
                i
              
            
            )
            −
            3
            
              /
            
            2
          
          )
        
        .
      
    
    {\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i})+c\left({\frac {1}{n}}\sum _{i}g(u_{i})-3/2\right).}

<p>Using 
  
    
      
        n
        =
        1500
      
    
    {\displaystyle n=1500}
  
 realizations and an estimated optimal coefficient 
  
    
      
        
          c
          
            ⋆
          
        
        ≈
        0.4773
      
    
    {\displaystyle c^{\star }\approx 0.4773}
  
 we obtain the following results
</p>
<table><tbody><tr><td></td><td>Estimate</td><td>Variance</td></tr><tr><td><i>Classical estimate</i></td><td>0.69475</td><td>0.01947</td></tr><tr><td><i>Control variates </i></td><td>0.69295</td><td>0.00060</td></tr></tbody></table>
<p>The variance was significantly reduced after using the control variates technique. (The exact result is   
  
    
      
        I
        =
        ln
        ⁡
        2
        ≈
        0.69314718
      
    
    {\displaystyle I=\ln 2\approx 0.69314718}
  
.)
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Antithetic_variates/KsLacGAT">Antithetic variates</a></li>
<li><a href="/facts/Importance_sampling/jrbUnMXa">Importance sampling</a></li></ul>

<h2 id="notes">Notes</h2>

<ul><li>Ross, Sheldon M. (2002) <i>Simulation</i> 3rd edition <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-12-598053-1</li>
<li>Averill M. Law & W. David Kelton (2000), <i>Simulation Modeling and Analysis</i>, 3rd edition. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-07-116537-1</li>
<li>S. P. Meyn (2007) <i>Control Techniques for Complex Networks</i>, Cambridge University Press. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-521-88441-9.  <a href="https://web.archive.org/web/20100619011046/https://netfiles.uiuc.edu/meyn/www/spm_files/CTCN/CTCN.html">Downloadable draft</a> (Section 11.4: Control variates and shadow functions)</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Lemieux, C. (2017). "Control Variates". Wiley StatsRef: Statistics Reference Online: 1–8. doi:10.1002/9781118445112.stat07947. ISBN 9781118445112. <a href="9781118445112" target="_blank">9781118445112</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York: Springer. ISBN 0-387-00451-3 (p. 185) <a href="/wiki/ISBN_(identifier)" target="_blank">/wiki/ISBN_(identifier)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112. <a href="9781118445112" target="_blank">9781118445112</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112. <a href="9781118445112" target="_blank">9781118445112</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
</ol>

Control variates open-in-new

Control variates