Posterior predictive distribution

In <a href="/facts/Bayesian_statistics/9w7b1Bw4">Bayesian statistics</a>, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.
Given a set of N <a href="/facts/Independent_identically_distributed/othIRaWt">i.i.d.</a> observations 
 
 
 
 
 X
 
 =
 {
 
 x
 
 1
 
 
 ,
 …
 ,
 
 x
 
 N
 
 
 }
 
 
 {\displaystyle \mathbf {X} =\{x_{1},\dots ,x_{N}\}}
 
, a new value 
 
 
 
 
 
 
 x
 ~
 
 
 
 
 
 {\displaystyle {\tilde {x}}}
 
 will be drawn from a distribution that depends on a parameter 
 
 
 
 θ
 ∈
 Θ
 
 
 {\displaystyle \theta \in \Theta }
 
, where 
 
 
 
 Θ
 
 
 {\displaystyle \Theta }
 
 is the <a href="/facts/Parameter_space/LpdMY5gc">parameter space</a>.

p
        (
        
          
            
              x
              ~
            
          
        
        
          |
        
        θ
        )
      
    
    {\displaystyle p({\tilde {x}}|\theta )}

It may seem tempting to plug in a single best estimate 
 
 
 
 
 
 
 θ
 ^
 
 
 
 
 
 {\displaystyle {\hat {\theta }}}
 
 for 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
, but this ignores uncertainty about 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
, and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of 
 
 
 
 
 
 
 x
 ~
 
 
 
 
 
 {\displaystyle {\tilde {x}}}
 
 will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for.
A posterior predictive distribution accounts for uncertainty about 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
. The posterior distribution of possible 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
 values depends on 
 
 
 
 
 X
 
 
 
 {\displaystyle \mathbf {X} }
 
:

p
        (
        θ
        
          |
        
        
          X
        
        )
      
    
    {\displaystyle p(\theta |\mathbf {X} )}

And the posterior predictive distribution of 
 
 
 
 
 
 
 x
 ~
 
 
 
 
 
 {\displaystyle {\tilde {x}}}
 
 given 
 
 
 
 
 X
 
 
 
 {\displaystyle \mathbf {X} }
 
 is calculated by <a href="/facts/Marginal_distribution/U9XBWAd1">marginalizing</a> the distribution of 
 
 
 
 
 
 
 x
 ~
 
 
 
 
 
 {\displaystyle {\tilde {x}}}
 
 given 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
 over the posterior distribution of 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
 given 
 
 
 
 
 X
 
 
 
 {\displaystyle \mathbf {X} }
 
:

p
        (
        
          
            
              x
              ~
            
          
        
        
          |
        
        
          X
        
        )
        =
        
          ∫
          
            Θ
          
        
        p
        (
        
          
            
              x
              ~
            
          
        
        
          |
        
        θ
        )
        
        p
        (
        θ
        
          |
        
        
          X
        
        )
        d
        
        θ
      
    
    {\displaystyle p({\tilde {x}}|\mathbf {X} )=\int _{\Theta }p({\tilde {x}}|\theta )\,p(\theta |\mathbf {X} )\operatorname {d} \!\theta }

Because it accounts for uncertainty about 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
, the posterior predictive distribution will in general be wider than a predictive distribution which plugs in a single best estimate for 
 
 
 
 θ
 
 
 {\displaystyle \theta }
 
.

Posterior predictive distribution open-in-new

Posterior predictive distribution