In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.
Given a set of N i.i.d. observations X = { x 1 , … , x N } {\displaystyle \mathbf {X} =\{x_{1},\dots ,x_{N}\}} , a new value x ~ {\displaystyle {\tilde {x}}} will be drawn from a distribution that depends on a parameter θ ∈ Θ {\displaystyle \theta \in \Theta } , where Θ {\displaystyle \Theta } is the parameter space.
It may seem tempting to plug in a single best estimate θ ^ {\displaystyle {\hat {\theta }}} for θ {\displaystyle \theta } , but this ignores uncertainty about θ {\displaystyle \theta } , and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of x ~ {\displaystyle {\tilde {x}}} will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for.
A posterior predictive distribution accounts for uncertainty about θ {\displaystyle \theta } . The posterior distribution of possible θ {\displaystyle \theta } values depends on X {\displaystyle \mathbf {X} } :
And the posterior predictive distribution of x ~ {\displaystyle {\tilde {x}}} given X {\displaystyle \mathbf {X} } is calculated by marginalizing the distribution of x ~ {\displaystyle {\tilde {x}}} given θ {\displaystyle \theta } over the posterior distribution of θ {\displaystyle \theta } given X {\displaystyle \mathbf {X} } :
Because it accounts for uncertainty about θ {\displaystyle \theta } , the posterior predictive distribution will in general be wider than a predictive distribution which plugs in a single best estimate for θ {\displaystyle \theta } .