In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.
Motivation
Consider the unit square S = [ 0 , 1 ] × [ 0 , 1 ] {\displaystyle S=[0,1]\times [0,1]} in the Euclidean plane R 2 {\displaystyle \mathbb {R} ^{2}} . Consider the probability measure μ {\displaystyle \mu } defined on S {\displaystyle S} by the restriction of two-dimensional Lebesgue measure λ 2 {\displaystyle \lambda ^{2}} to S {\displaystyle S} . That is, the probability of an event E ⊆ S {\displaystyle E\subseteq S} is simply the area of E {\displaystyle E} . We assume E {\displaystyle E} is a measurable subset of S {\displaystyle S} .
Consider a one-dimensional subset of S {\displaystyle S} such as the line segment L x = { x } × [ 0 , 1 ] {\displaystyle L_{x}=\{x\}\times [0,1]} . L x {\displaystyle L_{x}} has μ {\displaystyle \mu } -measure zero; every subset of L x {\displaystyle L_{x}} is a μ {\displaystyle \mu } -null set; since the Lebesgue measure space is a complete measure space, E ⊆ L x ⟹ μ ( E ) = 0. {\displaystyle E\subseteq L_{x}\implies \mu (E)=0.}
While true, this is somewhat unsatisfying. It would be nice to say that μ {\displaystyle \mu } "restricted to" L x {\displaystyle L_{x}} is the one-dimensional Lebesgue measure λ 1 {\displaystyle \lambda ^{1}} , rather than the zero measure. The probability of a "two-dimensional" event E {\displaystyle E} could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" E ∩ L x {\displaystyle E\cap L_{x}} : more formally, if μ x {\displaystyle \mu _{x}} denotes one-dimensional Lebesgue measure on L x {\displaystyle L_{x}} , then μ ( E ) = ∫ [ 0 , 1 ] μ x ( E ∩ L x ) d x {\displaystyle \mu (E)=\int _{[0,1]}\mu _{x}(E\cap L_{x})\,\mathrm {d} x} for any "nice" E ⊆ S {\displaystyle E\subseteq S} . The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.
Statement of the theorem
(Hereafter, P ( X ) {\displaystyle {\mathcal {P}}(X)} will denote the collection of Borel probability measures on a topological space ( X , T ) {\displaystyle (X,T)} .) The assumptions of the theorem are as follows:
- Let Y {\displaystyle Y} and X {\displaystyle X} be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).
- Let μ ∈ P ( Y ) {\displaystyle \mu \in {\mathcal {P}}(Y)} .
- Let π : Y → X {\displaystyle \pi :Y\to X} be a Borel-measurable function. Here one should think of π {\displaystyle \pi } as a function to "disintegrate" Y {\displaystyle Y} , in the sense of partitioning Y {\displaystyle Y} into { π − 1 ( x ) | x ∈ X } {\displaystyle \{\pi ^{-1}(x)\ |\ x\in X\}} . For example, for the motivating example above, one can define π ( ( a , b ) ) = a {\displaystyle \pi ((a,b))=a} , ( a , b ) ∈ [ 0 , 1 ] × [ 0 , 1 ] {\displaystyle (a,b)\in [0,1]\times [0,1]} , which gives that π − 1 ( a ) = a × [ 0 , 1 ] {\displaystyle \pi ^{-1}(a)=a\times [0,1]} , a slice we want to capture.
- Let ν ∈ P ( X ) {\displaystyle \nu \in {\mathcal {P}}(X)} be the pushforward measure ν = π ∗ ( μ ) = μ ∘ π − 1 {\displaystyle \nu =\pi _{*}(\mu )=\mu \circ \pi ^{-1}} . This measure provides the distribution of x {\displaystyle x} (which corresponds to the events π − 1 ( x ) {\displaystyle \pi ^{-1}(x)} ).
The conclusion of the theorem: There exists a ν {\displaystyle \nu } -almost everywhere uniquely determined family of probability measures { μ x } x ∈ X ⊆ P ( Y ) {\displaystyle \{\mu _{x}\}_{x\in X}\subseteq {\mathcal {P}}(Y)} , which provides a "disintegration" of μ {\displaystyle \mu } into { μ x } x ∈ X {\displaystyle \{\mu _{x}\}_{x\in X}} , such that:
- the function x ↦ μ x {\displaystyle x\mapsto \mu _{x}} is Borel measurable, in the sense that x ↦ μ x ( B ) {\displaystyle x\mapsto \mu _{x}(B)} is a Borel-measurable function for each Borel-measurable set B ⊆ Y {\displaystyle B\subseteq Y} ;
- μ x {\displaystyle \mu _{x}} "lives on" the fiber π − 1 ( x ) {\displaystyle \pi ^{-1}(x)} : for ν {\displaystyle \nu } -almost all x ∈ X {\displaystyle x\in X} , μ x ( Y ∖ π − 1 ( x ) ) = 0 , {\displaystyle \mu _{x}\left(Y\setminus \pi ^{-1}(x)\right)=0,} and so μ x ( E ) = μ x ( E ∩ π − 1 ( x ) ) {\displaystyle \mu _{x}(E)=\mu _{x}(E\cap \pi ^{-1}(x))} ;
- for every Borel-measurable function f : Y → [ 0 , ∞ ] {\displaystyle f:Y\to [0,\infty ]} , ∫ Y f ( y ) d μ ( y ) = ∫ X ∫ π − 1 ( x ) f ( y ) d μ x ( y ) d ν ( x ) . {\displaystyle \int _{Y}f(y)\,\mathrm {d} \mu (y)=\int _{X}\int _{\pi ^{-1}(x)}f(y)\,\mathrm {d} \mu _{x}(y)\,\mathrm {d} \nu (x).} In particular, for any event E ⊆ Y {\displaystyle E\subseteq Y} , taking f {\displaystyle f} to be the indicator function of E {\displaystyle E} ,1 μ ( E ) = ∫ X μ x ( E ) d ν ( x ) . {\displaystyle \mu (E)=\int _{X}\mu _{x}(E)\,\mathrm {d} \nu (x).}
Applications
Product spaces
The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.
When Y {\displaystyle Y} is written as a Cartesian product Y = X 1 × X 2 {\displaystyle Y=X_{1}\times X_{2}} and π i : Y → X i {\displaystyle \pi _{i}:Y\to X_{i}} is the natural projection, then each fibre π 1 − 1 ( x 1 ) {\displaystyle \pi _{1}^{-1}(x_{1})} can be canonically identified with X 2 {\displaystyle X_{2}} and there exists a Borel family of probability measures { μ x 1 } x 1 ∈ X 1 {\displaystyle \{\mu _{x_{1}}\}_{x_{1}\in X_{1}}} in P ( X 2 ) {\displaystyle {\mathcal {P}}(X_{2})} (which is ( π 1 ) ∗ ( μ ) {\displaystyle (\pi _{1})_{*}(\mu )} -almost everywhere uniquely determined) such that μ = ∫ X 1 μ x 1 μ ( π 1 − 1 ( d x 1 ) ) = ∫ X 1 μ x 1 d ( π 1 ) ∗ ( μ ) ( x 1 ) , {\displaystyle \mu =\int _{X_{1}}\mu _{x_{1}}\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)=\int _{X_{1}}\mu _{x_{1}}\,\mathrm {d} (\pi _{1})_{*}(\mu )(x_{1}),} which is in particular ∫ X 1 × X 2 f ( x 1 , x 2 ) μ ( d x 1 , d x 2 ) = ∫ X 1 ( ∫ X 2 f ( x 1 , x 2 ) μ ( d x 2 ∣ x 1 ) ) μ ( π 1 − 1 ( d x 1 ) ) {\displaystyle \int _{X_{1}\times X_{2}}f(x_{1},x_{2})\,\mu (\mathrm {d} x_{1},\mathrm {d} x_{2})=\int _{X_{1}}\left(\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1})\right)\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)} and μ ( A × B ) = ∫ A μ ( B ∣ x 1 ) μ ( π 1 − 1 ( d x 1 ) ) . {\displaystyle \mu (A\times B)=\int _{A}\mu \left(B\mid x_{1}\right)\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right).}
The relation to conditional expectation is given by the identities E ( f ∣ π 1 ) ( x 1 ) = ∫ X 2 f ( x 1 , x 2 ) μ ( d x 2 ∣ x 1 ) , {\displaystyle \operatorname {E} (f\mid \pi _{1})(x_{1})=\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1}),} μ ( A × B ∣ π 1 ) ( x 1 ) = 1 A ( x 1 ) ⋅ μ ( B ∣ x 1 ) . {\displaystyle \mu (A\times B\mid \pi _{1})(x_{1})=1_{A}(x_{1})\cdot \mu (B\mid x_{1}).}
Vector calculus
The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface Σ ⊂ R 3 {\displaystyle \Sigma \subset \mathbb {R} ^{3}} , it is implicit that the "correct" measure on Σ {\displaystyle \Sigma } is the disintegration of three-dimensional Lebesgue measure λ 3 {\displaystyle \lambda ^{3}} on Σ {\displaystyle \Sigma } , and that the disintegration of this measure on ∂Σ is the same as the disintegration of λ 3 {\displaystyle \lambda ^{3}} on ∂ Σ {\displaystyle \partial \Sigma } .2
Conditional distributions
The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.3 The theorem is related to the Borel–Kolmogorov paradox, for example.
See also
- Ionescu-Tulcea theorem – Probability theorem
- Joint probability distribution – Type of probability distribution
- Copula (statistics) – Statistical distribution for dependence between random variables
- Conditional expectation – Expected value of a random variable given that certain conditions are known to occur
- Borel–Kolmogorov paradox – Conditional probability paradox
- Regular conditional probability
References
Dellacherie, C.; Meyer, P.-A. (1978). Probabilities and Potential. North-Holland Mathematics Studies. Amsterdam: North-Holland. ISBN 0-7204-0701-X. 0-7204-0701-X ↩
Ambrosio, L.; Gigli, N.; Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 978-3-7643-2428-5. 978-3-7643-2428-5 ↩
Chang, J.T.; Pollard, D. (1997). "Conditioning as disintegration" (PDF). Statistica Neerlandica. 51 (3): 287. CiteSeerX 10.1.1.55.7544. doi:10.1111/1467-9574.00056. S2CID 16749932. http://www.stat.yale.edu/~jtc5/papers/ConditioningAsDisintegration.pdf ↩