Entropy rate - Reference.org

On this page

Entropy rate

Time density of the average information in a stochastic process

In the mathematical theory of probability, the entropy rate, also known as the source information rate, assigns an entropy measure to a stochastic process. For a strongly stationary process, the conditional entropy of the latest random variable converges to this entropy rate, reflecting the average uncertainty per time step in the process. This concept is fundamental in understanding the informational properties and complexity of stochastic systems.

We don't have any images related to Entropy rate yet.

You can add one yourself here.

We don't have any YouTube videos related to Entropy rate yet.

You can add one yourself here.

We don't have any PDF documents related to Entropy rate yet.

You can add one yourself here.

We don't have any Books related to Entropy rate yet.

You can add one yourself here.

We don't have any archived web articles related to Entropy rate yet.

You can submit a link to a page to archive here.

Definition

A process X {\displaystyle X} with a countable index gives rise to the sequence of its joint entropies H n ( X 1 , X 2 , … X n ) {\displaystyle H_{n}(X_{1},X_{2},\dots X_{n})} . If the limit exists, the entropy rate is defined as

H ( X ) := lim n → ∞ 1 n H n . {\displaystyle H(X):=\lim _{n\to \infty }{\tfrac {1}{n}}H_{n}.}

Note that given any sequence ( a n ) n {\displaystyle (a_{n})_{n}} with a 0 = 0 {\displaystyle a_{0}=0} and letting Δ a k := a k − a k − 1 {\displaystyle \Delta a_{k}:=a_{k}-a_{k-1}} , by telescoping one has a n = ∑ k = 1 n Δ a k {\displaystyle a_{n}={\textstyle \sum _{k=1}^{n}}\Delta a_{k}} . The entropy rate thus computes the mean of the first n {\displaystyle n} such entropy changes, with n {\displaystyle n} going to infinity. The n {\displaystyle n} th entropy change is itself the conditional entropy H ( X n | X n − 1 , X n − 2 , . . . ) {\displaystyle H(X_{n}|X_{n-1},X_{n-2},...)} . The entropy rate is thus the average entropy of the distribution of the n {\displaystyle n} th variable once the previous ones are known. The behaviour of joint entropies from one index to the next is also explicitly subject in some characterizations of entropy.

Discussion

While X {\displaystyle X} may be understood as a sequence of random variables, the entropy rate H ( X ) {\displaystyle H(X)} represents the average entropy change per one random variable, in the long term.

It can be thought of as a general property of stochastic sources - this is the subject of the asymptotic equipartition property.

For strongly stationary processes

A stochastic process also gives rise to a sequence of conditional entropies, comprising more and more random variables. For strongly stationary stochastic processes, the entropy rate equals the limit of that sequence

H ( X ) = lim n → ∞ H ( X n | X n − 1 , X n − 2 , … X 1 ) {\displaystyle H(X)=\lim _{n\to \infty }H(X_{n}|X_{n-1},X_{n-2},\dots X_{1})}

The quantity given by the limit on the right is also denoted H ′ ( X ) {\displaystyle H'(X)} , which is motivated to the extent that here this is then again a rate associated with the process, in the above sense.

For Markov chains

Since a stochastic process defined by a Markov chain that is irreducible and aperiodic has a stationary distribution, the entropy rate is independent of the initial distribution¹.

For example, consider a Markov chain defined on a countable number of states. Given its right stochastic transition matrix P i j {\displaystyle P_{ij}} and an entropy

h i := − ∑ j P i j log ⁡ P i j {\displaystyle h_{i}:=-\sum _{j}P_{ij}\log P_{ij}}

associated with each state, one finds

H ( X ) = ∑ i μ i h i , {\displaystyle \displaystyle H(X)=\sum _{i}\mu _{i}h_{i},}

where μ i {\displaystyle \mu _{i}} is the asymptotic distribution of the chain.

In particular, it follows that the entropy rate of an i.i.d. stochastic process is the same as the entropy of any individual member in the process.

For hidden Markov models

The entropy rate of hidden Markov models (HMM) has no known closed-form solution. However, it has known upper and lower bounds. Let the underlying Markov chain X 1 : ∞ {\displaystyle X_{1:\infty }} be stationary, and let Y 1 : ∞ {\displaystyle Y_{1:\infty }} be the observable states, then we have H ( Y n | X 1 , Y 1 : n − 1 ) ≤ H ( Y ) ≤ H ( Y n | Y 1 : n − 1 ) {\displaystyle H(Y_{n}|X_{1},Y_{1:n-1})\leq H(Y)\leq H(Y_{n}|Y_{1:n-1})} and at the limit of n → ∞ {\displaystyle n\to \infty } , both sides converge to the middle.²

Applications

The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum entropy rate criterion may be used for feature selection in machine learning.³

External links

Cover, T. and Thomas, J. Elements of Information Theory. John Wiley and Sons, Inc. Second Edition, 2006.

References

Cover, Thomas M.; Thomas, Joy A. (2006). Elements of information theory (2nd ed.). Hoboken, N.J: Wiley-Interscience. p. 78. ISBN 978-0-471-24195-9. 978-0-471-24195-9 ↩
Cover, Thomas M.; Thomas, Joy A. (2006). "4.5. Functions of Markov chains". Elements of information theory (2nd ed.). Hoboken, N.J: Wiley-Interscience. ISBN 978-0-471-24195-9. 978-0-471-24195-9 ↩
Einicke, G. A. (2018). "Maximum-Entropy Rate Selection of Features for Classifying Changes in Knee and Ankle Dynamics During Running". IEEE Journal of Biomedical and Health Informatics. 28 (4): 1097–1103. arXiv:2501.13750. doi:10.1109/JBHI.2017.2711487. hdl:10810/68978. PMID 29969403. S2CID 49555941. /wiki/ArXiv_(identifier) ↩

Definition

Discussion

For strongly stationary processes

For Markov chains

For hidden Markov models

Applications

See also

External links

References