Marginal likelihood

<h2 id="concept">Concept</h2>
<p>Given a set of <a href="/facts/Independent_identically_distributed/othIRaWt">independent identically distributed</a> data points 
  
    
      
        
          X
        
        =
        (
        
          x
          
            1
          
        
        ,
        …
        ,
        
          x
          
            n
          
        
        )
        ,
      
    
    {\displaystyle \mathbf {X} =(x_{1},\ldots ,x_{n}),}
  
 where 
  
    
      
        
          x
          
            i
          
        
        ∼
        p
        (
        x
        
          |
        
        θ
        )
      
    
    {\displaystyle x_{i}\sim p(x|\theta )}
  
 according to some <a href="/facts/Probability_distribution/EpsKKVRu">probability distribution</a> parameterized by 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
, where 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 itself is a <a href="/facts/Random_variable/TwTBXnLT">random variable</a> described by a distribution, i.e. 
  
    
      
        θ
        ∼
        p
        (
        θ
        ∣
        α
        )
        ,
      
    
    {\displaystyle \theta \sim p(\theta \mid \alpha ),}
  
 the marginal likelihood in general asks what the probability 
  
    
      
        p
        (
        
          X
        
        ∣
        α
        )
      
    
    {\displaystyle p(\mathbf {X} \mid \alpha )}
  
 is, where 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 has been <a href="/facts/Marginal_distribution/U9XBWAd1">marginalized out</a> (integrated out):
</p>

p
        (
        
          X
        
        ∣
        α
        )
        =
        
          ∫
          
            θ
          
        
        p
        (
        
          X
        
        ∣
        θ
        )
        
        p
        (
        θ
        ∣
        α
        )
         
        d
        
        θ
      
    
    {\displaystyle p(\mathbf {X} \mid \alpha )=\int _{\theta }p(\mathbf {X} \mid \theta )\,p(\theta \mid \alpha )\ \operatorname {d} \!\theta }

<p>The above definition is phrased in the context of <a href="/facts/Bayesian_statistics/9w7b1Bw4">Bayesian statistics</a> in which case 
  
    
      
        p
        (
        θ
        ∣
        α
        )
      
    
    {\displaystyle p(\theta \mid \alpha )}
  
 is called prior density and 
  
    
      
        p
        (
        
          X
        
        ∣
        θ
        )
      
    
    {\displaystyle p(\mathbf {X} \mid \theta )}
  
 is the likelihood. Recognizing that the marginal likelihood is the normalizing constant of the Bayesian posterior density 
  
    
      
        p
        (
        θ
        ∣
        
          X
        
        ,
        α
        )
      
    
    {\displaystyle p(\theta \mid \mathbf {X} ,\alpha )}
  
, one also has the alternative expression<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a> 
</p>

p
        (
        
          X
        
        ∣
        α
        )
        =
        
          
            
              p
              (
              
                X
              
              ∣
              θ
              ,
              α
              )
              p
              (
              θ
              ∣
              α
              )
            
            
              p
              (
              θ
              ∣
              
                X
              
              ,
              α
              )
            
          
        
      
    
    {\displaystyle p(\mathbf {X} \mid \alpha )={\frac {p(\mathbf {X} \mid \theta ,\alpha )p(\theta \mid \alpha )}{p(\theta \mid \mathbf {X} ,\alpha )}}}

<p>which is an identity in 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise[<i>how?</i>] in de Carvalho et al. (2019). In classical (<a href="/facts/Frequentist_statistics/UcNgebQc">frequentist</a>) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter 
  
    
      
        θ
        =
        (
        ψ
        ,
        λ
        )
      
    
    {\displaystyle \theta =(\psi ,\lambda )}
  
, where 
  
    
      
        ψ
      
    
    {\displaystyle \psi }
  
 is the actual parameter of interest, and 
  
    
      
        λ
      
    
    {\displaystyle \lambda }
  
 is a non-interesting <a href="/facts/Nuisance_parameter/acJWVqxj">nuisance parameter</a>.  If there exists a probability distribution for 
  
    
      
        λ
      
    
    {\displaystyle \lambda }
  
[<i>dubious – discuss</i>], it is often desirable to consider the likelihood function only in terms of 
  
    
      
        ψ
      
    
    {\displaystyle \psi }
  
, by marginalizing out 
  
    
      
        λ
      
    
    {\displaystyle \lambda }
  
:
</p>

L
          
        
        (
        ψ
        ;
        
          X
        
        )
        =
        p
        (
        
          X
        
        ∣
        ψ
        )
        =
        
          ∫
          
            λ
          
        
        p
        (
        
          X
        
        ∣
        λ
        ,
        ψ
        )
        
        p
        (
        λ
        ∣
        ψ
        )
         
        d
        
        λ
      
    
    {\displaystyle {\mathcal {L}}(\psi ;\mathbf {X} )=p(\mathbf {X} \mid \psi )=\int _{\lambda }p(\mathbf {X} \mid \lambda ,\psi )\,p(\lambda \mid \psi )\ \operatorname {d} \!\lambda }

<p>Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the <a href="/facts/Conjugate_prior/ScCFcs8b">conjugate prior</a> of the distribution of the data. In other cases, some kind of <a href="/facts/Numerical_integration/MwJnvcDV">numerical integration</a> method is needed, either a general method such as <a href="/facts/Gaussian_integration/MLmU1Dgc">Gaussian integration</a> or a <a href="/facts/Monte_Carlo_method/AkHjY7jc">Monte Carlo method</a>, or a method specialized to statistical problems such as the <a href="/facts/Laplace_approximation/Hr6YjG6o">Laplace approximation</a>, <a href="/facts/Gibbs_sampling/HDYRhxEq">Gibbs</a>/<a href="/facts/Metropolis%25E2%2580%2593Hastings_algorithm/8K7bz3LN">Metropolis</a> sampling, or the <a href="/facts/EM_algorithm/32P2bdBc">EM algorithm</a>.
</p><p>It is also possible to apply the above considerations to a single random variable (data point) 
  
    
      
        x
      
    
    {\displaystyle x}
  
, rather than a set of observations.  In a Bayesian context, this is equivalent to the <a href="/facts/Prior_predictive_distribution/I6hSVx66">prior predictive distribution</a> of a data point.
</p>
<h2 id="applications">Applications</h2>
<h3>Bayesian model comparison</h3>
<p>In <a href="/facts/Bayesian_model_comparison/Cu4QH8hv">Bayesian model comparison</a>, the marginalized variables 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 are parameters for a particular type of model, and the remaining variable 
  
    
      
        M
      
    
    {\displaystyle M}
  
 is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 for the model parameters, the marginal likelihood for the model <i>M</i> is
</p>

p
        (
        
          X
        
        ∣
        M
        )
        =
        ∫
        p
        (
        
          X
        
        ∣
        θ
        ,
        M
        )
        
        p
        (
        θ
        ∣
        M
        )
        
        d
        
        θ
      
    
    {\displaystyle p(\mathbf {X} \mid M)=\int p(\mathbf {X} \mid \theta ,M)\,p(\theta \mid M)\,\operatorname {d} \!\theta }

<p>It is in this context that the term <i>model evidence</i> is normally used.  This quantity is important because the posterior odds ratio for a model <i>M</i>1 against another model <i>M</i>2 involves a ratio of marginal likelihoods, called the <a href="/facts/Bayes_factor/Cu4QH8hv">Bayes factor</a>:
</p>

p
              (
              
                M
                
                  1
                
              
              ∣
              
                X
              
              )
            
            
              p
              (
              
                M
                
                  2
                
              
              ∣
              
                X
              
              )
            
          
        
        =
        
          
            
              p
              (
              
                M
                
                  1
                
              
              )
            
            
              p
              (
              
                M
                
                  2
                
              
              )
            
          
        
        
        
          
            
              p
              (
              
                X
              
              ∣
              
                M
                
                  1
                
              
              )
            
            
              p
              (
              
                X
              
              ∣
              
                M
                
                  2
                
              
              )
            
          
        
      
    
    {\displaystyle {\frac {p(M_{1}\mid \mathbf {X} )}{p(M_{2}\mid \mathbf {X} )}}={\frac {p(M_{1})}{p(M_{2})}}\,{\frac {p(\mathbf {X} \mid M_{1})}{p(\mathbf {X} \mid M_{2})}}}

<p>which can be stated schematically as
</p>
posterior <a href="/facts/Odds/VtM45sI6">odds</a> = prior odds × <a href="/facts/Bayes_factor/Cu4QH8hv">Bayes factor</a>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Empirical_Bayes_methods/q9EEyMDK">Empirical Bayes methods</a></li>
<li><a href="/facts/Lindley%2527s_paradox/BM4rbCTr">Lindley's paradox</a></li>
<li><a href="/facts/Marginal_probability/U9XBWAd1">Marginal probability</a></li>
<li><a href="/facts/Bayesian_information_criterion/NHspnLrM">Bayesian information criterion</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li>Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors, <i>COMPSTAT 2002: Proceedings in Computational Statistics</i>, pp. 111–117. 2002. <i>(Available as a preprint on <a href="/facts/SSRN_(identifier)/njd3WRzR">SSRN</a> <a href="https://ssrn.com/abstract=332860">332860</a>)</i></li>
<li>de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference". <i>Bayesian Analysis</i>. 14 (4): 1013‒1036. <i>(Available as a preprint on the web: <a href="https://www.maths.ed.ac.uk/~mdecarv/papers/decarvalho2018.pdf">[1]</a>)</i></li>
<li>Lambert, Ben (2018). "The devil is in the denominator". <i>A Student's Guide to Bayesian Statistics</i>. Sage. pp. 109–120. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-1-4739-1636-4.</li>
<li><a href="http://www.inference.phy.cam.ac.uk/mackay/itila/">The on-line textbook: Information Theory, Inference, and Learning Algorithms</a>, by <a href="/facts/David_J.C._MacKay/NK3D8JME">David J.C. MacKay</a>.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Šmídl, Václav; Quinn, Anthony (2006). "Bayesian Theory". The Variational Bayes Method in Signal Processing. Springer. pp. 13–23. doi:10.1007/3-540-28820-1_2. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Chib, Siddhartha (1995). "Marginal likelihood from the Gibbs output". Journal of the American Statistical Association. 90 (432): 1313–1321. doi:10.1080/01621459.1995.10476635. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
</ol>

Marginal likelihood open-in-new

Marginal likelihood