Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of variance is either the expected value of the SDM (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for analysis of variance involve the partitioning of a sum of SDM.
Background
An understanding of the computations involved is greatly enhanced by a study of the statistical value
E ( X 2 ) {\displaystyle \operatorname {E} (X^{2})} , where E {\displaystyle \operatorname {E} } is the expected value operator.For a random variable X {\displaystyle X} with mean μ {\displaystyle \mu } and variance σ 2 {\displaystyle \sigma ^{2}} ,
σ 2 = E ( X 2 ) − μ 2 . {\displaystyle \sigma ^{2}=\operatorname {E} (X^{2})-\mu ^{2}.} 1(Its derivation is shown here.) Therefore,
E ( X 2 ) = σ 2 + μ 2 . {\displaystyle \operatorname {E} (X^{2})=\sigma ^{2}+\mu ^{2}.}From the above, the following can be derived:
E ( ∑ ( X 2 ) ) = n σ 2 + n μ 2 , {\displaystyle \operatorname {E} \left(\sum \left(X^{2}\right)\right)=n\sigma ^{2}+n\mu ^{2},} E ( ( ∑ X ) 2 ) = n σ 2 + n 2 μ 2 . {\displaystyle \operatorname {E} \left(\left(\sum X\right)^{2}\right)=n\sigma ^{2}+n^{2}\mu ^{2}.}Sample variance
Main article: Sample variance
The sum of squared deviations needed to calculate sample variance (before deciding whether to divide by n or n − 1) is most easily calculated as
S = ∑ x 2 − ( ∑ x ) 2 n {\displaystyle S=\sum x^{2}-{\frac {\left(\sum x\right)^{2}}{n}}}From the two derived expectations above the expected value of this sum is
E ( S ) = n σ 2 + n μ 2 − n σ 2 + n 2 μ 2 n {\displaystyle \operatorname {E} (S)=n\sigma ^{2}+n\mu ^{2}-{\frac {n\sigma ^{2}+n^{2}\mu ^{2}}{n}}}which implies
E ( S ) = ( n − 1 ) σ 2 . {\displaystyle \operatorname {E} (S)=(n-1)\sigma ^{2}.}This effectively proves the use of the divisor n − 1 in the calculation of an unbiased sample estimate of σ2.
Partition — analysis of variance
Main article: Partition of sums of squares
In the situation where data is available for k different treatment groups having size ni where i varies from 1 to k, then it is assumed that the expected mean of each group is
E ( μ i ) = μ + T i {\displaystyle \operatorname {E} (\mu _{i})=\mu +T_{i}}and the variance of each treatment group is unchanged from the population variance σ 2 {\displaystyle \sigma ^{2}} .
Under the Null Hypothesis that the treatments have no effect, then each of the T i {\displaystyle T_{i}} will be zero.
It is now possible to calculate three sums of squares:
Individual I = ∑ x 2 {\displaystyle I=\sum x^{2}} E ( I ) = n σ 2 + n μ 2 {\displaystyle \operatorname {E} (I)=n\sigma ^{2}+n\mu ^{2}} Treatments T = ∑ i = 1 k ( ( ∑ x ) 2 / n i ) {\displaystyle T=\sum _{i=1}^{k}\left(\left(\sum x\right)^{2}/n_{i}\right)} E ( T ) = k σ 2 + ∑ i = 1 k n i ( μ + T i ) 2 {\displaystyle \operatorname {E} (T)=k\sigma ^{2}+\sum _{i=1}^{k}n_{i}(\mu +T_{i})^{2}} E ( T ) = k σ 2 + n μ 2 + 2 μ ∑ i = 1 k ( n i T i ) + ∑ i = 1 k n i ( T i ) 2 {\displaystyle \operatorname {E} (T)=k\sigma ^{2}+n\mu ^{2}+2\mu \sum _{i=1}^{k}(n_{i}T_{i})+\sum _{i=1}^{k}n_{i}(T_{i})^{2}}Under the null hypothesis that the treatments cause no differences and all the T i {\displaystyle T_{i}} are zero, the expectation simplifies to
E ( T ) = k σ 2 + n μ 2 . {\displaystyle \operatorname {E} (T)=k\sigma ^{2}+n\mu ^{2}.} Combination C = ( ∑ x ) 2 / n {\displaystyle C=\left(\sum x\right)^{2}/n} E ( C ) = σ 2 + n μ 2 {\displaystyle \operatorname {E} (C)=\sigma ^{2}+n\mu ^{2}}Sums of squared deviations
Under the null hypothesis, the difference of any pair of I, T, and C does not contain any dependency on μ {\displaystyle \mu } , only σ 2 {\displaystyle \sigma ^{2}} .
E ( I − C ) = ( n − 1 ) σ 2 {\displaystyle \operatorname {E} (I-C)=(n-1)\sigma ^{2}} total squared deviations aka total sum of squares E ( T − C ) = ( k − 1 ) σ 2 {\displaystyle \operatorname {E} (T-C)=(k-1)\sigma ^{2}} treatment squared deviations aka explained sum of squares E ( I − T ) = ( n − k ) σ 2 {\displaystyle \operatorname {E} (I-T)=(n-k)\sigma ^{2}} residual squared deviations aka residual sum of squaresThe constants (n − 1), (k − 1), and (n − k) are normally referred to as the number of degrees of freedom.
Example
In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.
I = 1 2 1 + 2 2 1 + 3 2 1 + 4 2 1 + 6 2 1 = 66 {\displaystyle I={\frac {1^{2}}{1}}+{\frac {2^{2}}{1}}+{\frac {3^{2}}{1}}+{\frac {4^{2}}{1}}+{\frac {6^{2}}{1}}=66} T = ( 1 + 2 + 3 ) 2 3 + ( 4 + 6 ) 2 2 = 12 + 50 = 62 {\displaystyle T={\frac {(1+2+3)^{2}}{3}}+{\frac {(4+6)^{2}}{2}}=12+50=62} C = ( 1 + 2 + 3 + 4 + 6 ) 2 5 = 256 / 5 = 51.2 {\displaystyle C={\frac {(1+2+3+4+6)^{2}}{5}}=256/5=51.2}Giving
Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom. Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom. Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.Two-way analysis of variance
This section is an excerpt from Two-way analysis of variance.[edit]
In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.See also
- Absolute deviation
- Algorithms for calculating variance
- Errors and residuals
- Least squares
- Mean squared error
- Residual sum of squares
- Root mean square deviation
- Variance decomposition of forecast errors
References
Mood & Graybill: An introduction to the Theory of Statistics (McGraw Hill) ↩