The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.
Statement
Let three random variables form the Markov chain X → Y → Z {\displaystyle X\rightarrow Y\rightarrow Z} , implying that the conditional distribution of Z {\displaystyle Z} depends only on Y {\displaystyle Y} and is conditionally independent of X {\displaystyle X} . Specifically, we have such a Markov chain if the joint probability mass function can be written as
p ( x , y , z ) = p ( x ) p ( y | x ) p ( z | y ) = p ( y ) p ( x | y ) p ( z | y ) {\displaystyle p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)}In this setting, no processing of Y {\displaystyle Y} , deterministic or random, can increase the information that Y {\displaystyle Y} contains about X {\displaystyle X} . Using the mutual information, this can be written as :
I ( X ; Y ) ⩾ I ( X ; Z ) , {\displaystyle I(X;Y)\geqslant I(X;Z),}with the equality I ( X ; Y ) = I ( X ; Z ) {\displaystyle I(X;Y)=I(X;Z)} if and only if I ( X ; Y ∣ Z ) = 0 {\displaystyle I(X;Y\mid Z)=0} . That is, Z {\displaystyle Z} and Y {\displaystyle Y} contain the same information about X {\displaystyle X} , and X → Z → Y {\displaystyle X\rightarrow Z\rightarrow Y} also forms a Markov chain.2
Proof
One can apply the chain rule for mutual information to obtain two different decompositions of I ( X ; Y , Z ) {\displaystyle I(X;Y,Z)} :
I ( X ; Z ) + I ( X ; Y ∣ Z ) = I ( X ; Y , Z ) = I ( X ; Y ) + I ( X ; Z ∣ Y ) {\displaystyle I(X;Z)+I(X;Y\mid Z)=I(X;Y,Z)=I(X;Y)+I(X;Z\mid Y)}By the relationship X → Y → Z {\displaystyle X\rightarrow Y\rightarrow Z} , we know that X {\displaystyle X} and Z {\displaystyle Z} are conditionally independent, given Y {\displaystyle Y} , which means the conditional mutual information, I ( X ; Z ∣ Y ) = 0 {\displaystyle I(X;Z\mid Y)=0} . The data processing inequality then follows from the non-negativity of I ( X ; Y ∣ Z ) ≥ 0 {\displaystyle I(X;Y\mid Z)\geq 0} .
See also
External links
References
Beaudry, Normand (2012), "An intuitive proof of the data processing inequality", Quantum Information & Computation, 12 (5–6): 432–441, arXiv:1107.0740, Bibcode:2011arXiv1107.0740B, doi:10.26421/QIC12.5-6-4, S2CID 9531510 /wiki/ArXiv_(identifier) ↩
Cover; Thomas (2012). Elements of information theory. John Wiley & Sons. ↩