In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.
Definition
For a set of n random variables { X 1 , … , X n } {\displaystyle \{X_{1},\ldots ,X_{n}\}} , the dual total correlation D ( X 1 , … , X n ) {\displaystyle D(X_{1},\ldots ,X_{n})} is given by
D ( X 1 , … , X n ) = H ( X 1 , … , X n ) − ∑ i = 1 n H ( X i ∣ X 1 , … , X i − 1 , X i + 1 , … , X n ) , {\displaystyle D(X_{1},\ldots ,X_{n})=H\left(X_{1},\ldots ,X_{n}\right)-\sum _{i=1}^{n}H\left(X_{i}\mid X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n}\right),}where H ( X 1 , … , X n ) {\displaystyle H(X_{1},\ldots ,X_{n})} is the joint entropy of the variable set { X 1 , … , X n } {\displaystyle \{X_{1},\ldots ,X_{n}\}} and H ( X i ∣ ⋯ ) {\displaystyle H(X_{i}\mid \cdots )} is the conditional entropy of variable X i {\displaystyle X_{i}} , given the rest.
Normalized
The dual total correlation normalized between [0,1] is simply the dual total correlation divided by its maximum value H ( X 1 , … , X n ) {\displaystyle H(X_{1},\ldots ,X_{n})} ,
N D ( X 1 , … , X n ) = D ( X 1 , … , X n ) H ( X 1 , … , X n ) . {\displaystyle ND(X_{1},\ldots ,X_{n})={\frac {D(X_{1},\ldots ,X_{n})}{H(X_{1},\ldots ,X_{n})}}.}Relationship with Total Correlation
Dual total correlation is non-negative and bounded above by the joint entropy H ( X 1 , … , X n ) {\displaystyle H(X_{1},\ldots ,X_{n})} .
0 ≤ D ( X 1 , … , X n ) ≤ H ( X 1 , … , X n ) . {\displaystyle 0\leq D(X_{1},\ldots ,X_{n})\leq H(X_{1},\ldots ,X_{n}).}Secondly, Dual total correlation has a close relationship with total correlation, C ( X 1 , … , X n ) {\displaystyle C(X_{1},\ldots ,X_{n})} , and can be written in terms of differences between the total correlation of the whole, and all subsets of size N − 1 {\displaystyle N-1} :7
D ( X ) = ( N − 1 ) C ( X ) − ∑ i = 1 N C ( X − i ) {\displaystyle D({\textbf {X}})=(N-1)C({\textbf {X}})-\sum _{i=1}^{N}C({\textbf {X}}^{-i})}where X = { X 1 , … , X n } {\displaystyle {\textbf {X}}=\{X_{1},\ldots ,X_{n}\}} and X − i = { X 1 , … , X i − 1 , X i + 1 , … , X n } {\displaystyle {\textbf {X}}^{-i}=\{X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n}\}}
Furthermore, the total correlation and dual total correlation are related by the following bounds:
C ( X 1 , … , X n ) n − 1 ≤ D ( X 1 , … , X n ) ≤ ( n − 1 ) C ( X 1 , … , X n ) . {\displaystyle {\frac {C(X_{1},\ldots ,X_{n})}{n-1}}\leq D(X_{1},\ldots ,X_{n})\leq (n-1)\;C(X_{1},\ldots ,X_{n}).}Finally, the difference between the total correlation and the dual total correlation defines a novel measure of higher-order information-sharing: the O-information:8
Ω ( X ) = C ( X ) − D ( X ) {\displaystyle \Omega ({\textbf {X}})=C({\textbf {X}})-D({\textbf {X}})} .The O-information (first introduced as the "enigmatic information" by James and Crutchfield9 is a signed measure that quantifies the extent to which the information in a multivariate random variable is dominated by synergistic interactions (in which case Ω ( X ) < 0 {\displaystyle \Omega ({\textbf {X}})<0} ) or redundant interactions (in which case Ω ( X ) > 0 {\displaystyle \Omega ({\textbf {X}})>0} .
History
Han (1978) originally defined the dual total correlation as,
D ( X 1 , … , X n ) ≡ [ ∑ i = 1 n H ( X 1 , … , X i − 1 , X i + 1 , … , X n ) ] − ( n − 1 ) H ( X 1 , … , X n ) . {\displaystyle {\begin{aligned}&D(X_{1},\ldots ,X_{n})\\[10pt]\equiv {}&\left[\sum _{i=1}^{n}H(X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})\right]-(n-1)\;H(X_{1},\ldots ,X_{n})\;.\end{aligned}}}However Abdallah and Plumbley (2010) showed its equivalence to the easier-to-understand form of the joint entropy minus the sum of conditional entropies via the following:
D ( X 1 , … , X n ) ≡ [ ∑ i = 1 n H ( X 1 , … , X i − 1 , X i + 1 , … , X n ) ] − ( n − 1 ) H ( X 1 , … , X n ) = [ ∑ i = 1 n H ( X 1 , … , X i − 1 , X i + 1 , … , X n ) ] + ( 1 − n ) H ( X 1 , … , X n ) = H ( X 1 , … , X n ) + [ ∑ i = 1 n H ( X 1 , … , X i − 1 , X i + 1 , … , X n ) − H ( X 1 , … , X n ) ] = H ( X 1 , … , X n ) − ∑ i = 1 n H ( X i ∣ X 1 , … , X i − 1 , X i + 1 , … , X n ) . {\displaystyle {\begin{aligned}&D(X_{1},\ldots ,X_{n})\\[10pt]\equiv {}&\left[\sum _{i=1}^{n}H(X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})\right]-(n-1)\;H(X_{1},\ldots ,X_{n})\\={}&\left[\sum _{i=1}^{n}H(X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})\right]+(1-n)\;H(X_{1},\ldots ,X_{n})\\={}&H(X_{1},\ldots ,X_{n})+\left[\sum _{i=1}^{n}H(X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})-H(X_{1},\ldots ,X_{n})\right]\\={}&H\left(X_{1},\ldots ,X_{n}\right)-\sum _{i=1}^{n}H\left(X_{i}\mid X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n}\right)\;.\end{aligned}}}See also
Bibliography
Footnotes
References
- Fujishige, Satoru (1978). "Polymatroidal dependence structure of a set of random variables". Information and Control. 39: 55–72. doi:10.1016/S0019-9958(78)91063-X.
- Varley, Thomas; Pope, Maria; Faskowitz, Joshua; Sporns, Olaf (2023). "Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex". Communications Biology. 6: 451. doi:10.1038/s42003-023-04843-w. PMC 10125999. PMID 37095282.
References
Han, Te Sun (1978). "Nonnegative entropy measures of multivariate symmetric correlations". Information and Control. 36 (2): 133–156. doi:10.1016/S0019-9958(78)90275-9. https://doi.org/10.1016%2FS0019-9958%2878%2990275-9 ↩
Dubnov, Shlomo (2006). "Spectral Anticipations". Computer Music Journal. 30 (2): 63–83. doi:10.1162/comj.2006.30.2.63. S2CID 2202704. /wiki/Doi_(identifier) ↩
Nihat Ay, E. Olbrich, N. Bertschinger (2001). A unifying framework for complexity measures of finite systems. European Conference on Complex Systems. pdf. http://www.cabdyn.ox.ac.uk/complexity_PDFs/ECCS06/Conference_Proceedings/PDF/p202.pdf ↩
Olbrich, E.; Bertschinger, N.; Ay, N.; Jost, J. (2008). "How should complexity scale with system size?". The European Physical Journal B. 63 (3): 407–415. Bibcode:2008EPJB...63..407O. doi:10.1140/epjb/e2008-00134-9. S2CID 120391127. https://doi.org/10.1140%2Fepjb%2Fe2008-00134-9 ↩
Abdallah, Samer A.; Plumbley, Mark D. (2010). "A measure of statistical complexity based on predictive information". arXiv:1012.1890v1 [math.ST]. /wiki/ArXiv_(identifier) ↩
Nihat Ay, E. Olbrich, N. Bertschinger (2001). A unifying framework for complexity measures of finite systems. European Conference on Complex Systems. pdf. http://www.cabdyn.ox.ac.uk/complexity_PDFs/ECCS06/Conference_Proceedings/PDF/p202.pdf ↩
Varley, Thomas F.; Pope, Maria; Faskowitz, Joshua; Sporns, Olaf (24 April 2023). "Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex". Communications Biology. 6 (1): 451. doi:10.1038/s42003-023-04843-w. PMC 10125999. PMID 37095282. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125999 ↩
Rosas, Fernando E.; Mediano, Pedro A. M.; Gastpar, Michael; Jensen, Henrik J. (13 September 2019). "Quantifying high-order interdependencies via multivariate extensions of the mutual information". Physical Review E. 100 (3): 032305. arXiv:1902.11239. Bibcode:2019PhRvE.100c2305R. doi:10.1103/PhysRevE.100.032305. PMID 31640038. /wiki/ArXiv_(identifier) ↩
James, Ryan G.; Ellison, Christopher J.; Crutchfield, James P. (1 September 2011). "Anatomy of a bit: Information in a time series observation". Chaos: An Interdisciplinary Journal of Nonlinear Science. 21 (3): 037109. arXiv:1105.2988. Bibcode:2011Chaos..21c7109J. doi:10.1063/1.3637494. PMID 21974672. /wiki/ArXiv_(identifier) ↩