Kullback–Leibler divergence

In <a href="/facts/Mathematical_statistics/DeCWSzsy">mathematical statistics</a>, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted 
 
 
 
 
 D
 
 KL
 
 
 (
 P
 ∥
 Q
 )
 
 
 {\displaystyle D_{\text{KL}}(P\parallel Q)}
 
, is a type of <a href="/facts/Statistical_distance/u6kgN8Dc">statistical distance</a>: a measure of how much a model <a href="/facts/Probability_distribution/EpsKKVRu">probability distribution</a> Q is different from a true probability distribution P. Mathematically, it is defined as

 
 
 
 
 D
 
 KL
 
 
 (
 P
 ∥
 Q
 )
 =
 
 ∑
 
 x
 ∈
 
 
 X
 
 
 
 
 P
 (
 x
 )
 
 log
 ⁡
 
 
 
 P
 (
 x
 )
 
 
 Q
 (
 x
 )
 
 
 
 .
 
 
 {\displaystyle D_{\text{KL}}(P\parallel Q)=\sum _{x\in {\mathcal {X}}}P(x)\,\log {\frac {P(x)}{Q(x)}}.}

A simple interpretation of the KL divergence of P from Q is the <a href="/facts/Expected_value/1XV0JKL8">expected</a> excess <a href="/facts/Surprisal/EkUdL669">surprise</a> from using Q as a model instead of P when the actual distribution is P. While it is a measure of how different two distributions are and is thus a "distance" in some sense, it is not actually a <a href="/facts/Metric_(mathematics)/RkUptSo8">metric</a>, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions (in contrast to <a href="/facts/Variation_of_information/YC1WzWdq">variation of information</a>), and does not satisfy the <a href="/facts/Triangle_inequality/KubLjkwr">triangle inequality</a>. Instead, in terms of <a href="/facts/Information_geometry/sFBShVbT">information geometry</a>, it is a type of <a href="/facts/Divergence_(statistics)/8qrn1djm">divergence</a>, a generalization of <a href="/facts/Squared_Euclidean_distance/9qDoQKQe">squared distance</a>, and for certain classes of distributions (notably an <a href="/facts/Exponential_family/1LkkqEIf">exponential family</a>), it satisfies a generalized <a href="/facts/Pythagorean_theorem/Lh6x2Kr5">Pythagorean theorem</a> (which applies to squared distances).
Relative entropy is always a non-negative <a href="/facts/Real_number/R02gw5Pb">real number</a>, with value 0 if and only if the two distributions in question are identical. It has diverse applications, both theoretical, such as characterizing the relative <a href="/facts/Entropy_(information_theory)/NLg4NLvt">(Shannon) entropy</a> in information systems, randomness in continuous <a href="/facts/Time_series/fSXPR817">time-series</a>, and information gain when comparing statistical models of <a href="/facts/Inference/RBbe95bP">inference</a>; and practical, such as applied statistics, <a href="/facts/Fluid_mechanics/NIy4tRfE">fluid mechanics</a>, <a href="/facts/Neuroscience/o5RV8Cas">neuroscience</a>, <a href="/facts/Bioinformatics/D5x2L8ee">bioinformatics</a>, and <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a>.

Kullback–Leibler divergence open-in-new

Kullback–Leibler divergence