For X1, X2, ... Xn independent and identically-distributed random variables in R with common cumulative distribution function F(x), the empirical distribution function is defined by
where IC is the indicator function of the set C.
For every (fixed) x, Fn(x) is a sequence of random variables which converge to F(x) almost surely by the strong law of large numbers. That is, Fn converges to F pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of Fn to F by the Glivenko–Cantelli theorem.2
A centered and scaled version of the empirical measure is the signed measure
It induces a map on measurable functions f given by
By the central limit theorem, G n ( A ) {\displaystyle G_{n}(A)} converges in distribution to a normal random variable N(0, P(A)(1 − P(A))) for fixed measurable set A. Similarly, for a fixed function f, G n f {\displaystyle G_{n}f} converges in distribution to a normal random variable N ( 0 , E ( f − E f ) 2 ) {\displaystyle N(0,\mathbb {E} (f-\mathbb {E} f)^{2})} , provided that E f {\displaystyle \mathbb {E} f} and E f 2 {\displaystyle \mathbb {E} f^{2}} exist.
Definition
A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain Gaussian process. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.
As an example, consider empirical distribution functions. For real-valued iid random variables X1, X2, ..., Xn they are given by
In this case, empirical processes are indexed by a class C = { ( − ∞ , x ] : x ∈ R } . {\displaystyle {\mathcal {C}}=\{(-\infty ,x]:x\in \mathbb {R} \}.} It has been shown that C {\displaystyle {\mathcal {C}}} is a Donsker class, in particular,
Mojirsheibani, M. (2007). "Nonparametric curve estimation with missing data: A general empirical process approach". Journal of Statistical Planning and Inference. 137 (9): 2733–2758. doi:10.1016/j.jspi.2006.02.016. /wiki/Doi_(identifier) ↩
Wolfowitz, J. (1954). "Generalization of the Theorem of Glivenko-Cantelli". The Annals of Mathematical Statistics. 25: 131–138. doi:10.1214/aoms/1177728852. https://doi.org/10.1214%2Faoms%2F1177728852 ↩