Residual sum of squares

<h2 id="one-explanatory-variable">One explanatory variable</h2>
In a model with a single explanatory variable, RSS is given by:<a class="footnote-ref" id="fnref:1" href="#fn:1">1</a>

RSS
        =
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        (
        
          y
          
            i
          
        
        −
        f
        (
        
          x
          
            i
          
        
        )
        
          )
          
            2
          
        
      
    
    {\displaystyle \operatorname {RSS} =\sum _{i=1}^{n}(y_{i}-f(x_{i}))^{2}}

where yi is the ith value of the variable to be predicted, xi is the ith value of the explanatory variable, and 
 
 
 
 f
 (
 
 x
 
 i
 
 
 )
 
 
 {\displaystyle f(x_{i})}
 
 is the predicted value of yi (also termed 
 
 
 
 
 
 
 
 y
 
 i
 
 
 ^
 
 
 
 
 
 {\displaystyle {\hat {y_{i}}}}
 
).
In a standard linear simple <a href="/facts/Regression_model/n6z5Tf7K">regression model</a>, 
 
 
 
 
 y
 
 i
 
 
 =
 α
 +
 β
 
 x
 
 i
 
 
 +
 
 ε
 
 i
 
 
 
 
 
 {\displaystyle y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}\,}
 
, where 
 
 
 
 α
 
 
 {\displaystyle \alpha }
 
 and 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
 are <a href="/facts/Coefficient/5Hwq2Wuq">coefficients</a>, y and x are the <a href="/facts/Regressand/dINfzIF2">regressand</a> and the <a href="/facts/Regressor/dINfzIF2">regressor</a>, respectively, and ε is the <a href="/facts/Errors_and_residuals_in_statistics/Ef0LgAS4">error term</a>. The sum of squares of residuals is the sum of squares of 
 
 
 
 
 
 
 
 
 ε
 
 
 ^
 
 
 
 
 i
 
 
 
 
 {\displaystyle {\widehat {\varepsilon \,}}_{i}}
 
; that is

RSS
        =
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        (
        
          
            
              
                
                  ε
                  
                
                ^
              
            
          
          
            i
          
        
        
          )
          
            2
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        (
        
          y
          
            i
          
        
        −
        (
        
          
            
              
                α
                
              
              ^
            
          
        
        +
        
          
            
              
                β
                
              
              ^
            
          
        
        
          x
          
            i
          
        
        )
        
          )
          
            2
          
        
      
    
    {\displaystyle \operatorname {RSS} =\sum _{i=1}^{n}({\widehat {\varepsilon \,}}_{i})^{2}=\sum _{i=1}^{n}(y_{i}-({\widehat {\alpha \,}}+{\widehat {\beta \,}}x_{i}))^{2}}

where 
 
 
 
 
 
 
 
 α
 
 
 ^
 
 
 
 
 
 {\displaystyle {\widehat {\alpha \,}}}
 
 is the estimated value of the constant term 
 
 
 
 α
 
 
 {\displaystyle \alpha }
 
 and 
 
 
 
 
 
 
 
 β
 
 
 ^
 
 
 
 
 
 {\displaystyle {\widehat {\beta \,}}}
 
 is the estimated value of the slope coefficient 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
.

<h2 id="matrix-expression-for-the-ols-residual-sum-of-squares">Matrix expression for the OLS residual sum of squares</h2>
The general regression model with n observations and k explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is

y
        =
        X
        β
        +
        e
      
    
    {\displaystyle y=X\beta +e}

where y is an n × 1 vector of dependent variable observations, each column of the n × k matrix X is a vector of observations on one of the k explanators, 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
 is a k × 1 vector of true coefficients, and e is an n× 1 vector of the true underlying errors. The <a href="/facts/Ordinary_least_squares/q7H9k5vM">ordinary least squares</a> estimator for 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
 is

X
        
          
            
              β
              ^
            
          
        
        =
        y
        
        ⟺
        
      
    
    {\displaystyle X{\hat {\beta }}=y\iff }

X
          
            T
          
        
        X
        
          
            
              β
              ^
            
          
        
        =
        
          X
          
            T
          
        
        y
        
        ⟺
        
      
    
    {\displaystyle X^{\operatorname {T} }X{\hat {\beta }}=X^{\operatorname {T} }y\iff }

β
              ^
            
          
        
        =
        (
        
          X
          
            T
          
        
        X
        
          )
          
            −
            1
          
        
        
          X
          
            T
          
        
        y
        .
      
    
    {\displaystyle {\hat {\beta }}=(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y.}

The residual vector 
 
 
 
 
 
 
 e
 ^
 
 
 
 =
 y
 −
 X
 
 
 
 β
 ^
 
 
 
 =
 y
 −
 X
 (
 
 X
 
 T
 
 
 X
 
 )
 
 −
 1
 
 
 
 X
 
 T
 
 
 y
 
 
 {\displaystyle {\hat {e}}=y-X{\hat {\beta }}=y-X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y}
 
; so the residual sum of squares is:

RSS
 =
 
 
 
 
 e
 ^
 
 
 
 
 T
 
 
 
 
 
 e
 ^
 
 
 
 =
 ‖
 
 
 
 e
 ^
 
 
 
 
 ‖
 
 2
 
 
 
 
 {\displaystyle \operatorname {RSS} ={\hat {e}}^{\operatorname {T} }{\hat {e}}=\|{\hat {e}}\|^{2}}
 
,
(equivalent to the square of the <a href="/facts/Vector_norm/xIbR4uE1">norm</a> of residuals). In full:

RSS
 =
 
 y
 
 T
 
 
 y
 −
 
 y
 
 T
 
 
 X
 (
 
 X
 
 T
 
 
 X
 
 )
 
 −
 1
 
 
 
 X
 
 T
 
 
 y
 =
 
 y
 
 T
 
 
 [
 I
 −
 X
 (
 
 X
 
 T
 
 
 X
 
 )
 
 −
 1
 
 
 
 X
 
 T
 
 
 ]
 y
 =
 
 y
 
 T
 
 
 [
 I
 −
 H
 ]
 y
 
 
 {\displaystyle \operatorname {RSS} =y^{\operatorname {T} }y-y^{\operatorname {T} }X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y=y^{\operatorname {T} }[I-X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }]y=y^{\operatorname {T} }[I-H]y}
 
,
where H is the <a href="/facts/Hat_matrix/5bLbzjy9">hat matrix</a>, or the projection matrix in linear regression.

<h2 id="relation-with-pearsons-product-moment-correlation">Relation with Pearson's product-moment correlation</h2>
The <a href="/facts/Least_squares/50jIwbxC">least-squares regression line</a> is given by

y
 =
 a
 x
 +
 b
 
 
 {\displaystyle y=ax+b}
 
,
where 
 
 
 
 b
 =
 
 
 
 y
 ¯
 
 
 
 −
 a
 
 
 
 x
 ¯
 
 
 
 
 
 {\displaystyle b={\bar {y}}-a{\bar {x}}}
 
 and 
 
 
 
 a
 =
 
 
 
 S
 
 x
 y
 
 
 
 S
 
 x
 x
 
 
 
 
 
 
 {\displaystyle a={\frac {S_{xy}}{S_{xx}}}}
 
, where 
 
 
 
 
 S
 
 x
 y
 
 
 =
 
 ∑
 
 i
 =
 1
 
 
 n
 
 
 (
 
 
 
 x
 ¯
 
 
 
 −
 
 x
 
 i
 
 
 )
 (
 
 
 
 y
 ¯
 
 
 
 −
 
 y
 
 i
 
 
 )
 
 
 {\displaystyle S_{xy}=\sum _{i=1}^{n}({\bar {x}}-x_{i})({\bar {y}}-y_{i})}
 
 and 
 
 
 
 
 S
 
 x
 x
 
 
 =
 
 ∑
 
 i
 =
 1
 
 
 n
 
 
 (
 
 
 
 x
 ¯
 
 
 
 −
 
 x
 
 i
 
 
 
 )
 
 2
 
 
 .
 
 
 {\displaystyle S_{xx}=\sum _{i=1}^{n}({\bar {x}}-x_{i})^{2}.}

Therefore,

RSS
              
              
                
                =
                
                  ∑
                  
                    i
                    =
                    1
                  
                  
                    n
                  
                
                (
                
                  y
                  
                    i
                  
                
                −
                f
                (
                
                  x
                  
                    i
                  
                
                )
                
                  )
                  
                    2
                  
                
                =
                
                  ∑
                  
                    i
                    =
                    1
                  
                  
                    n
                  
                
                (
                
                  y
                  
                    i
                  
                
                −
                (
                a
                
                  x
                  
                    i
                  
                
                +
                b
                )
                
                  )
                  
                    2
                  
                
                =
                
                  ∑
                  
                    i
                    =
                    1
                  
                  
                    n
                  
                
                (
                
                  y
                  
                    i
                  
                
                −
                a
                
                  x
                  
                    i
                  
                
                −
                
                  
                    
                      y
                      ¯
                    
                  
                
                +
                a
                
                  
                    
                      x
                      ¯
                    
                  
                
                
                  )
                  
                    2
                  
                
              
            
            
              
              
                
                =
                
                  ∑
                  
                    i
                    =
                    1
                  
                  
                    n
                  
                
                (
                a
                (
                
                  
                    
                      x
                      ¯
                    
                  
                
                −
                
                  x
                  
                    i
                  
                
                )
                −
                (
                
                  
                    
                      y
                      ¯
                    
                  
                
                −
                
                  y
                  
                    i
                  
                
                )
                
                  )
                  
                    2
                  
                
                =
                
                  a
                  
                    2
                  
                
                
                  S
                  
                    x
                    x
                  
                
                −
                2
                a
                
                  S
                  
                    x
                    y
                  
                
                +
                
                  S
                  
                    y
                    y
                  
                
                =
                
                  S
                  
                    y
                    y
                  
                
                −
                a
                
                  S
                  
                    x
                    y
                  
                
                =
                
                  S
                  
                    y
                    y
                  
                
                
                  (
                  
                    1
                    −
                    
                      
                        
                          S
                          
                            x
                            y
                          
                          
                            2
                          
                        
                        
                          
                            S
                            
                              x
                              x
                            
                          
                          
                            S
                            
                              y
                              y
                            
                          
                        
                      
                    
                  
                  )
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\operatorname {RSS} &=\sum _{i=1}^{n}(y_{i}-f(x_{i}))^{2}=\sum _{i=1}^{n}(y_{i}-(ax_{i}+b))^{2}=\sum _{i=1}^{n}(y_{i}-ax_{i}-{\bar {y}}+a{\bar {x}})^{2}\\[5pt]&=\sum _{i=1}^{n}(a({\bar {x}}-x_{i})-({\bar {y}}-y_{i}))^{2}=a^{2}S_{xx}-2aS_{xy}+S_{yy}=S_{yy}-aS_{xy}=S_{yy}\left(1-{\frac {S_{xy}^{2}}{S_{xx}S_{yy}}}\right)\end{aligned}}}

where 
 
 
 
 
 S
 
 y
 y
 
 
 =
 
 ∑
 
 i
 =
 1
 
 
 n
 
 
 (
 
 
 
 y
 ¯
 
 
 
 −
 
 y
 
 i
 
 
 
 )
 
 2
 
 
 .
 
 
 {\displaystyle S_{yy}=\sum _{i=1}^{n}({\bar {y}}-y_{i})^{2}.}

The <a href="/facts/Pearson_correlation_coefficient/Igf0xDPc">Pearson product-moment correlation</a> is given by 
 
 
 
 r
 =
 
 
 
 S
 
 x
 y
 
 
 
 
 S
 
 x
 x
 
 
 
 S
 
 y
 y
 
 
 
 
 
 ;
 
 
 {\displaystyle r={\frac {S_{xy}}{\sqrt {S_{xx}S_{yy}}}};}
 
 therefore, 
 
 
 
 RSS
 =
 
 S
 
 y
 y
 
 
 (
 1
 −
 
 r
 
 2
 
 
 )
 .
 
 
 {\displaystyle \operatorname {RSS} =S_{yy}(1-r^{2}).}

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Akaike_information_criterion/RjKKgcnF">Akaike information criterion#Comparison with least squares</a></li>
<li><a href="/facts/Chi-squared_distribution/wYnWWgUe">Chi-squared distribution#Applications</a></li>
<li><a href="/facts/Degrees_of_freedom_(statistics)/fvwKCU86">Degrees of freedom (statistics)#Sum of squares and degrees of freedom</a></li>
<li><a href="/facts/Errors_and_residuals_in_statistics/Ef0LgAS4">Errors and residuals in statistics</a></li>
<li><a href="/facts/Lack-of-fit_sum_of_squares/xpo4Jqjn">Lack-of-fit sum of squares</a></li>
<li><a href="/facts/Mean_squared_error/kz3TR7bv">Mean squared error</a></li>
<li><a href="/facts/Reduced_chi-squared_statistic/EEkm83X0">Reduced chi-squared statistic</a>, RSS per degree of freedom</li>
<li><a href="/facts/Squared_deviations/qMoU6JGp">Squared deviations</a></li>
<li><a href="/facts/Sum_of_squares_(statistics)/KoF6U0uQ">Sum of squares (statistics)</a></li></ul>

<ul><li>Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-471-17082-8.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Archdeacon, Thomas J. (1994). Correlation and regression analysis : a historian's guide. University of Wisconsin Press. pp. 161–162. ISBN 0-299-13650-7. OCLC 27266095. <a href="0-299-13650-7" target="_blank">0-299-13650-7</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
</ol>

Residual sum of squares open-in-new

Residual sum of squares