Coefficient of multiple correlation

<h2 id="definition">Definition</h2>
<p>The coefficient of multiple correlation, denoted <i>R</i>, is a <a href="/facts/Scalar_(mathematics)/MkDbA4Eo">scalar</a> that is defined as the <a href="/facts/Pearson_correlation_coefficient/Igf0xDPc">Pearson correlation coefficient</a> between the predicted and the actual values of the dependent variable in a linear regression model that includes an <a href="/facts/Y-intercept/oJvV25PI">intercept</a>.
</p>
<h2 id="computation">Computation</h2>
<p>The square of the coefficient of multiple correlation can be computed using the <a href="/facts/Euclidean_space/R2UbzmzM">vector</a> 
  
    
      
        
          c
        
        =
        
          
            (
            
              r
              
                
                  x
                  
                    1
                  
                
                y
              
            
            ,
            
              r
              
                
                  x
                  
                    2
                  
                
                y
              
            
            ,
            …
            ,
            
              r
              
                
                  x
                  
                    N
                  
                
                y
              
            
            )
          
          
            ⊤
          
        
      
    
    {\displaystyle \mathbf {c} ={(r_{x_{1}y},r_{x_{2}y},\dots ,r_{x_{N}y})}^{\top }}
  
 of <a href="/facts/Correlation/egkluAEm">correlations</a> 
  
    
      
        
          r
          
            
              x
              
                n
              
            
            y
          
        
      
    
    {\displaystyle r_{x_{n}y}}
  
 between the predictor variables 
  
    
      
        
          x
          
            n
          
        
      
    
    {\displaystyle x_{n}}
  
 (independent variables) and the target variable 
  
    
      
        y
      
    
    {\displaystyle y}
  
 (dependent variable), and the <a href="/facts/Correlation_matrix/egkluAEm">correlation matrix</a> 
  
    
      
        
          R
          
            x
            x
          
        
      
    
    {\displaystyle R_{xx}}
  
 of correlations between predictor variables. It is given by
</p>

R
          
            2
          
        
        =
        
          
            c
          
          
            ⊤
          
        
        
          R
          
            x
            x
          
          
            −
            1
          
        
        
        
          c
        
        ,
      
    
    {\displaystyle R^{2}=\mathbf {c} ^{\top }R_{xx}^{-1}\,\mathbf {c} ,}

<p>where 
  
    
      
        
          
            c
          
          
            ⊤
          
        
      
    
    {\displaystyle \mathbf {c} ^{\top }}
  
 is the <a href="/facts/Transpose/8wmsagGS">transpose</a> of 
  
    
      
        
          c
        
      
    
    {\displaystyle \mathbf {c} }
  
, and 
  
    
      
        
          R
          
            x
            x
          
          
            −
            1
          
        
      
    
    {\displaystyle R_{xx}^{-1}}
  
 is the <a href="/facts/Matrix_inversion/fPqXk3V8">inverse</a> of the matrix
</p>

R
          
            x
            x
          
        
        =
        
          (
          
            
              
                
                  
                    r
                    
                      
                        x
                        
                          1
                        
                      
                      
                        x
                        
                          1
                        
                      
                    
                  
                
                
                  
                    r
                    
                      
                        x
                        
                          1
                        
                      
                      
                        x
                        
                          2
                        
                      
                    
                  
                
                
                  …
                
                
                  
                    r
                    
                      
                        x
                        
                          1
                        
                      
                      
                        x
                        
                          N
                        
                      
                    
                  
                
              
              
                
                  
                    r
                    
                      
                        x
                        
                          2
                        
                      
                      
                        x
                        
                          1
                        
                      
                    
                  
                
                
                  ⋱
                
                
                
                  ⋮
                
              
              
                
                  ⋮
                
                
                
                  ⋱
                
                
              
              
                
                  
                    r
                    
                      
                        x
                        
                          N
                        
                      
                      
                        x
                        
                          1
                        
                      
                    
                  
                
                
                  …
                
                
                
                  
                    r
                    
                      
                        x
                        
                          N
                        
                      
                      
                        x
                        
                          N
                        
                      
                    
                  
                
              
            
          
          )
        
        .
      
    
    {\displaystyle R_{xx}=\left({\begin{array}{cccc}r_{x_{1}x_{1}}&r_{x_{1}x_{2}}&\dots &r_{x_{1}x_{N}}\\r_{x_{2}x_{1}}&\ddots &&\vdots \\\vdots &&\ddots &\\r_{x_{N}x_{1}}&\dots &&r_{x_{N}x_{N}}\end{array}}\right).}

<p>If all the predictor variables are uncorrelated, the matrix 
  
    
      
        
          R
          
            x
            x
          
        
      
    
    {\displaystyle R_{xx}}
  
 is the identity matrix and 
  
    
      
        
          R
          
            2
          
        
      
    
    {\displaystyle R^{2}}
  
 simply equals 
  
    
      
        
          
            c
          
          
            ⊤
          
        
        
        
          c
        
      
    
    {\displaystyle \mathbf {c} ^{\top }\,\mathbf {c} }
  
, the sum of the squared correlations with the dependent variable.  If the predictor variables are correlated among themselves, the inverse of the correlation matrix 
  
    
      
        
          R
          
            x
            x
          
        
      
    
    {\displaystyle R_{xx}}
  
 accounts for this.
</p><p>The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the <a href="/facts/Sum_of_squares_of_residuals/vbGsrB3x">sum of squares of residuals</a>—that is, the sum of the squares of the prediction errors—divided by the <a href="/facts/Total_sum_of_squares/brFhGeVT">sum of squares of deviations of the values of the dependent variable</a> from its <a href="/facts/Expected_value/1XV0JKL8">expected value</a>.
</p>
<h2 id="properties">Properties</h2>
<p>With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of 
  
    
      
        y
      
    
    {\displaystyle y}
  
 on 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        z
      
    
    {\displaystyle z}
  
 will in general have a different 
  
    
      
        R
      
    
    {\displaystyle R}
  
 than will a regression of 
  
    
      
        z
      
    
    {\displaystyle z}
  
 on 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        y
      
    
    {\displaystyle y}
  
.  For example, suppose that in a particular sample the variable 
  
    
      
        z
      
    
    {\displaystyle z}
  
 is <a href="/facts/Correlation_and_dependence/egkluAEm">uncorrelated</a> with both 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        y
      
    
    {\displaystyle y}
  
, while 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        y
      
    
    {\displaystyle y}
  
 are linearly related to each other. Then a regression of 
  
    
      
        z
      
    
    {\displaystyle z}
  
 on 
  
    
      
        y
      
    
    {\displaystyle y}
  
 and 
  
    
      
        x
      
    
    {\displaystyle x}
  
 will yield an 
  
    
      
        R
      
    
    {\displaystyle R}
  
 of zero, while a regression of 
  
    
      
        y
      
    
    {\displaystyle y}
  
 on 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        z
      
    
    {\displaystyle z}
  
 will yield a strictly positive 
  
    
      
        R
      
    
    {\displaystyle R}
  
. This follows since the correlation of 
  
    
      
        y
      
    
    {\displaystyle y}
  
 with its best predictor based on 
  
    
      
        x
      
    
    {\displaystyle x}
  
 and 
  
    
      
        z
      
    
    {\displaystyle z}
  
 is in all cases at least as large as the correlation of 
  
    
      
        y
      
    
    {\displaystyle y}
  
 with its best predictor based on 
  
    
      
        x
      
    
    {\displaystyle x}
  
 alone, and in this case with 
  
    
      
        z
      
    
    {\displaystyle z}
  
 providing no explanatory power it will be exactly as large.
</p>

<h2 id="further-reading">Further reading</h2>
<ul><li>Allison, Paul D. (1998). <i>Multiple Regression: A Primer</i>. London: Sage Publications. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 9780761985334</li>
<li>Cohen, Jacob, et al. (2002). <i>Applied Multiple Regression: Correlation Analysis for the Behavioral Sciences</i>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0805822232</li>
<li>Crown, William H. (1998). <i>Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models</i>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0275953165</li>
<li>Edwards, Allen Louis (1985). <i>Multiple Regression and the Analysis of Variance and Covariance</i>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0716710811</li>
<li>Keith, Timothy (2006). <i>Multiple Regression and Beyond</i>. Boston: Pearson Education.</li>
<li>Fred N. Kerlinger, Elazar J. Pedhazur (1973). <i>Multiple Regression in Behavioral Research.</i> New York: Holt Rinehart Winston. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 9780030862113</li>
<li>Stanton, Jeffrey M. (2001). <a href="http://www.amstat.org/publications/jse/v9n3/stanton.html">"Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors"</a>, <i>Journal of Statistics Education</i>, 9 (3).</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Introduction to Multiple Regression
 <a href="http://onlinestatbook.com/2/regression/multiple_regression.html" target="_blank">http://onlinestatbook.com/2/regression/multiple_regression.html</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Multiple correlation coefficient <a href="http://mtweb.mtsu.edu/stats/regression/level3/multicorrel/multicorrcoef.htm" target="_blank">http://mtweb.mtsu.edu/stats/regression/level3/multicorrel/multicorrcoef.htm</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
</ol>

Coefficient of multiple correlation open-in-new

Coefficient of multiple correlation