Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Yates's correction for continuity
Statistical method

In statistics, Yates's correction for continuity (or Yates's chi-squared test) is used in certain situations when testing for independence in a contingency table. It aims at correcting the error introduced by assuming that the discrete probabilities of frequencies in the table can be approximated by a continuous distribution (chi-squared). Unlike the standard Pearson chi-squared statistic, it is approximately unbiased.

We don't have any images related to Yates's correction for continuity yet.
We don't have any YouTube videos related to Yates's correction for continuity yet.
We don't have any PDF documents related to Yates's correction for continuity yet.
We don't have any Books related to Yates's correction for continuity yet.
We don't have any archived web articles related to Yates's correction for continuity yet.

Correction for approximation error

Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct, and introduces some error.

To reduce the error in approximation, Frank Yates, an English statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table.1 This reduces the chi-squared value obtained and thus increases its p-value.

The effect of Yates's correction is to prevent overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected count smaller than 5.

∑ i = 1 N O i = 20 {\displaystyle \sum _{i=1}^{N}O_{i}=20\,}

The following is Yates's corrected version of Pearson's chi-squared statistics:

χ Yates 2 = ∑ i = 1 N ( | O i − E i | − 0.5 ) 2 E i {\displaystyle \chi _{\text{Yates}}^{2}=\sum _{i=1}^{N}{(|O_{i}-E_{i}|-0.5)^{2} \over E_{i}}}

where:

Oi = an observed frequency Ei = an expected (theoretical) frequency, asserted by the null hypothesis N = number of distinct events

2 × 2 table

As a short-cut, for a 2 × 2 table with the following entries:

 SF 
Aaba+b
Bcdc+d
 a+cb+dN
χ Yates 2 = N ( | a d − b c | − N / 2 ) 2 ( a + b ) ( c + d ) ( a + c ) ( b + d ) . {\displaystyle \chi _{\text{Yates}}^{2}={\frac {N(|ad-bc|-N/2)^{2}}{(a+b)(c+d)(a+c)(b+d)}}.}

In some cases, this is better.

χ Yates 2 = N ( max ( 0 , | a d − b c | − N / 2 ) ) 2 N S N F N A N B . {\displaystyle \chi _{\text{Yates}}^{2}={\frac {N(\max(0,|ad-bc|-N/2))^{2}}{N_{S}N_{F}N_{A}N_{B}}}.}

Yates's correction should always be applied, as it will tend to improve the accuracy of the p-value obtained. However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value.

See also

References

  1. Yates, F (1934). "Contingency table involving small numbers and the χ2 test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235. JSTOR 2983604 /wiki/Frank_Yates