Normal-gamma distribution

<h2 id="definition">Definition</h2>
<p>For a pair of <a href="/facts/Random_variables/TwTBXnLT">random variables</a>, (<i>X</i>,<i>T</i>), suppose that the <a href="/facts/Conditional_distribution/0eGm3P9W">conditional distribution</a> of <i>X</i> given <i>T</i> is given by
</p>

X
        ∣
        T
        ∼
        N
        (
        μ
        ,
        1
        
          /
        
        (
        λ
        T
        )
        )
        
        
        ,
      
    
    {\displaystyle X\mid T\sim N(\mu ,1/(\lambda T))\,\!,}

<p>meaning that the conditional distribution is a <a href="/facts/Normal_distribution/UapjjPyQ">normal distribution</a> with <a href="/facts/Mean/swcEd4Pg">mean</a> 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and <a href="/facts/Precision_(statistics)/xsMYkkjA">precision</a> 
  
    
      
        λ
        T
      
    
    {\displaystyle \lambda T}
  
 — equivalently, with <a href="/facts/Variance/ULBJKXD1">variance</a> 
  
    
      
        1
        
          /
        
        (
        λ
        T
        )
        .
      
    
    {\displaystyle 1/(\lambda T).}

</p><p>Suppose also that the marginal distribution of <i>T</i> is given by
</p>

T
        ∣
        α
        ,
        β
        ∼
        Gamma
        ⁡
        (
        α
        ,
        β
        )
        ,
      
    
    {\displaystyle T\mid \alpha ,\beta \sim \operatorname {Gamma} (\alpha ,\beta ),}

<p>where this means that  <i>T</i> has a <a href="/facts/Gamma_distribution/lczcdJmw">gamma distribution</a>. Here <i>λ</i>, <i>α</i> and <i>β</i> are parameters of the joint distribution.
</p><p>Then  (<i>X</i>,<i>T</i>) has a normal-gamma distribution, and this is denoted by
</p>

(
        X
        ,
        T
        )
        ∼
        NormalGamma
        ⁡
        (
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
        .
      
    
    {\displaystyle (X,T)\sim \operatorname {NormalGamma} (\mu ,\lambda ,\alpha ,\beta ).}

<h2 id="properties">Properties</h2>
<h3>Probability density function</h3>
<p>The joint <a href="/facts/Probability_density_function/zvfybna4">probability density function</a> of (<i>X</i>,<i>T</i>) is
</p>

f
        (
        x
        ,
        τ
        ∣
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
        =
        
          
            
              
                β
                
                  α
                
              
              
                
                  λ
                
              
            
            
              Γ
              (
              α
              )
              
                
                  2
                  π
                
              
            
          
        
        
        
          τ
          
            α
            −
            
              
                1
                2
              
            
          
        
        
        
          e
          
            −
            β
            τ
          
        
        exp
        ⁡
        
          (
          
            −
            
              
                
                  λ
                  τ
                  (
                  x
                  −
                  μ
                  
                    )
                    
                      2
                    
                  
                
                2
              
            
          
          )
        
        ,
      
    
    {\displaystyle f(x,\tau \mid \mu ,\lambda ,\alpha ,\beta )={\frac {\beta ^{\alpha }{\sqrt {\lambda }}}{\Gamma (\alpha ){\sqrt {2\pi }}}}\,\tau ^{\alpha -{\frac {1}{2}}}\,e^{-\beta \tau }\exp \left(-{\frac {\lambda \tau (x-\mu )^{2}}{2}}\right),}

<p>where the <a href="/facts/Conditional_probability/QcN2UERV">conditional probability</a> for 
  
    
      
        f
        (
        x
        ,
        τ
        ∣
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
        =
        f
        (
        x
        ∣
        τ
        ,
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
        f
        (
        τ
        ∣
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
      
    
    {\displaystyle f(x,\tau \mid \mu ,\lambda ,\alpha ,\beta )=f(x\mid \tau ,\mu ,\lambda ,\alpha ,\beta )f(\tau \mid \mu ,\lambda ,\alpha ,\beta )}

was used.
</p>
<h3>Marginal distributions</h3>
<p>By construction, the <a href="/facts/Marginal_distribution/U9XBWAd1">marginal distribution</a> of 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
 is a <a href="/facts/Gamma_distribution/lczcdJmw">gamma distribution</a>, and the <a href="/facts/Conditional_distribution/0eGm3P9W">conditional distribution</a> of 
  
    
      
        x
      
    
    {\displaystyle x}
  
 given 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
 is a <a href="/facts/Gaussian_distribution/UapjjPyQ">Gaussian distribution</a>.  The <a href="/facts/Marginal_distribution/U9XBWAd1">marginal distribution</a> of 
  
    
      
        x
      
    
    {\displaystyle x}
  
 is a three-parameter non-standardized <a href="/facts/Student%27s_t-distribution/DeT1SDqH">Student's t-distribution</a> with parameters 
  
    
      
        (
        ν
        ,
        μ
        ,
        
          σ
          
            2
          
        
        )
        =
        (
        2
        α
        ,
        μ
        ,
        β
        
          /
        
        (
        λ
        α
        )
        )
      
    
    {\displaystyle (\nu ,\mu ,\sigma ^{2})=(2\alpha ,\mu ,\beta /(\lambda \alpha ))}
  
.
</p>
<h3>Exponential family</h3>
<p>The normal-gamma distribution is a four-parameter <a href="/facts/Exponential_family/1LkkqEIf">exponential family</a> with <a href="/facts/Natural_parameters/1LkkqEIf">natural parameters</a> 
  
    
      
        α
        −
        1
        
          /
        
        2
        ,
        −
        β
        −
        λ
        
          μ
          
            2
          
        
        
          /
        
        2
        ,
        λ
        μ
        ,
        −
        λ
        
          /
        
        2
      
    
    {\displaystyle \alpha -1/2,-\beta -\lambda \mu ^{2}/2,\lambda \mu ,-\lambda /2}
  
 and <a href="/facts/Natural_statistics/1LkkqEIf">natural statistics</a> 
  
    
      
        ln
        ⁡
        τ
        ,
        τ
        ,
        τ
        x
        ,
        τ
        
          x
          
            2
          
        
      
    
    {\displaystyle \ln \tau ,\tau ,\tau x,\tau x^{2}}
  
.
</p>
<h3>Moments of the natural statistics</h3>
<p>The following moments can be easily computed using the <a href="/facts/Exponential_family/1LkkqEIf">moment generating function of the sufficient statistic</a>:<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a>
</p>

E
        ⁡
        (
        ln
        ⁡
        T
        )
        =
        ψ
        
          (
          α
          )
        
        −
        ln
        ⁡
        β
        ,
      
    
    {\displaystyle \operatorname {E} (\ln T)=\psi \left(\alpha \right)-\ln \beta ,}

<p>where 
  
    
      
        ψ
        
          (
          α
          )
        
      
    
    {\displaystyle \psi \left(\alpha \right)}
  
 is the <a href="/facts/Digamma_function/nDTNu86e">digamma function</a>,
</p>

E
                ⁡
                (
                T
                )
              
              
                
                =
                
                  
                    α
                    β
                  
                
                ,
              
            
            
              
                E
                ⁡
                (
                T
                X
                )
              
              
                
                =
                μ
                
                  
                    α
                    β
                  
                
                ,
              
            
            
              
                E
                ⁡
                (
                T
                
                  X
                  
                    2
                  
                
                )
              
              
                
                =
                
                  
                    1
                    λ
                  
                
                +
                
                  μ
                  
                    2
                  
                
                
                  
                    α
                    β
                  
                
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\operatorname {E} (T)&={\frac {\alpha }{\beta }},\\[5pt]\operatorname {E} (TX)&=\mu {\frac {\alpha }{\beta }},\\[5pt]\operatorname {E} (TX^{2})&={\frac {1}{\lambda }}+\mu ^{2}{\frac {\alpha }{\beta }}.\end{aligned}}}

<h3>Scaling</h3>
<p>If 
  
    
      
        (
        X
        ,
        T
        )
        ∼
        
          N
          o
          r
          m
          a
          l
          G
          a
          m
          m
          a
        
        (
        μ
        ,
        λ
        ,
        α
        ,
        β
        )
        ,
      
    
    {\displaystyle (X,T)\sim \mathrm {NormalGamma} (\mu ,\lambda ,\alpha ,\beta ),}
  
 then for any 
  
    
      
        b
        >
        0
        ,
        (
        b
        X
        ,
        b
        T
        )
      
    
    {\displaystyle b>0,(bX,bT)}
  
 is distributed as 
  
    
      
        
          
            N
            o
            r
            m
            a
            l
            G
            a
            m
            m
            a
          
        
        (
        b
        μ
        ,
        λ
        
          /
        
        
          b
          
            3
          
        
        ,
        α
        ,
        β
        
          /
        
        b
        )
        .
      
    
    {\displaystyle {\rm {NormalGamma}}(b\mu ,\lambda /b^{3},\alpha ,\beta /b).}

</p>
<h2 id="posterior-distribution-of-the-parameters">Posterior distribution of the parameters</h2>
<p>Assume that <i>x</i> is distributed according to a normal distribution with unknown mean 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and precision 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
.
</p>

x
        ∼
        
          
            N
          
        
        (
        μ
        ,
        
          τ
          
            −
            1
          
        
        )
      
    
    {\displaystyle x\sim {\mathcal {N}}(\mu ,\tau ^{-1})}

<p>and that the prior distribution on 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
,  
  
    
      
        (
        μ
        ,
        τ
        )
      
    
    {\displaystyle (\mu ,\tau )}
  
, has a normal-gamma distribution
</p>

(
        μ
        ,
        τ
        )
        ∼
        
          NormalGamma
        
        (
        
          μ
          
            0
          
        
        ,
        
          λ
          
            0
          
        
        ,
        
          α
          
            0
          
        
        ,
        
          β
          
            0
          
        
        )
        ,
      
    
    {\displaystyle (\mu ,\tau )\sim {\text{NormalGamma}}(\mu _{0},\lambda _{0},\alpha _{0},\beta _{0}),}

<p>for which the density π satisfies
</p>

π
        (
        μ
        ,
        τ
        )
        ∝
        
          τ
          
            
              α
              
                0
              
            
            −
            
              
                1
                2
              
            
          
        
        
        exp
        ⁡
        [
        −
        
          β
          
            0
          
        
        τ
        ]
        
        exp
        ⁡
        
          [
          
            −
            
              
                
                  
                    λ
                    
                      0
                    
                  
                  τ
                  (
                  μ
                  −
                  
                    μ
                    
                      0
                    
                  
                  
                    )
                    
                      2
                    
                  
                
                2
              
            
          
          ]
        
        .
      
    
    {\displaystyle \pi (\mu ,\tau )\propto \tau ^{\alpha _{0}-{\frac {1}{2}}}\,\exp[-\beta _{0}\tau ]\,\exp \left[-{\frac {\lambda _{0}\tau (\mu -\mu _{0})^{2}}{2}}\right].}

<p>Suppose
</p>

x
          
            1
          
        
        ,
        …
        ,
        
          x
          
            n
          
        
        ∣
        μ
        ,
        τ
        ∼
        
          
            i
            .
          
          
            i
            .
          
          
            d
            .
          
        
        ⁡
        N
        ⁡
        
          (
          
            μ
            ,
            
              τ
              
                −
                1
              
            
          
          )
        
        ,
      
    
    {\displaystyle x_{1},\ldots ,x_{n}\mid \mu ,\tau \sim \operatorname {{i.}{i.}{d.}} \operatorname {N} \left(\mu ,\tau ^{-1}\right),}

<p>i.e. the components of 
  
    
      
        
          X
        
        =
        (
        
          x
          
            1
          
        
        ,
        …
        ,
        
          x
          
            n
          
        
        )
      
    
    {\displaystyle \mathbf {X} =(x_{1},\ldots ,x_{n})}
  
 are conditionally independent given 
  
    
      
        μ
        ,
        τ
      
    
    {\displaystyle \mu ,\tau }
  
 and the conditional distribution of each of them given 
  
    
      
        μ
        ,
        τ
      
    
    {\displaystyle \mu ,\tau }
  
 is normal with expected value 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and variance 
  
    
      
        1
        
          /
        
        τ
        .
      
    
    {\displaystyle 1/\tau .}
  
 The posterior distribution of 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
 given this dataset 
  
    
      
        
          X
        
      
    
    {\displaystyle \mathbb {X} }
  
 can be analytically determined by <a href="/facts/Bayes%27_theorem/wQqFB2Dt">Bayes' theorem</a><a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a> explicitly,
</p>

P
        
        (
        τ
        ,
        μ
        ∣
        
          X
        
        )
        ∝
        
          L
        
        (
        
          X
        
        ∣
        τ
        ,
        μ
        )
        π
        (
        τ
        ,
        μ
        )
        ,
      
    
    {\displaystyle \mathbf {P} (\tau ,\mu \mid \mathbf {X} )\propto \mathbf {L} (\mathbf {X} \mid \tau ,\mu )\pi (\tau ,\mu ),}

<p>where 
  
    
      
        
          L
        
      
    
    {\displaystyle \mathbf {L} }
  
 is the likelihood of the parameters given the data.
</p><p>Since the data are i.i.d, the likelihood of the entire dataset is equal to the product of the likelihoods of the individual data samples:
</p>

L
        
        (
        
          X
        
        ∣
        τ
        ,
        μ
        )
        =
        
          ∏
          
            i
            =
            1
          
          
            n
          
        
        
          L
        
        (
        
          x
          
            i
          
        
        ∣
        τ
        ,
        μ
        )
        .
      
    
    {\displaystyle \mathbf {L} (\mathbf {X} \mid \tau ,\mu )=\prod _{i=1}^{n}\mathbf {L} (x_{i}\mid \tau ,\mu ).}

<p>This expression can be simplified as follows:
</p>

L
                
                (
                
                  X
                
                ∣
                τ
                ,
                μ
                )
              
              
                
                ∝
                
                  ∏
                  
                    i
                    =
                    1
                  
                  
                    n
                  
                
                
                  τ
                  
                    1
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    (
                    
                      x
                      
                        i
                      
                    
                    −
                    μ
                    
                      )
                      
                        2
                      
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    n
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    
                      ∑
                      
                        i
                        =
                        1
                      
                      
                        n
                      
                    
                    (
                    
                      x
                      
                        i
                      
                    
                    −
                    μ
                    
                      )
                      
                        2
                      
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    n
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    
                      ∑
                      
                        i
                        =
                        1
                      
                      
                        n
                      
                    
                    (
                    
                      x
                      
                        i
                      
                    
                    −
                    
                      
                        
                          x
                          ¯
                        
                      
                    
                    +
                    
                      
                        
                          x
                          ¯
                        
                      
                    
                    −
                    μ
                    
                      )
                      
                        2
                      
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    n
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    
                      ∑
                      
                        i
                        =
                        1
                      
                      
                        n
                      
                    
                    
                      (
                      
                        (
                        
                          x
                          
                            i
                          
                        
                        −
                        
                          
                            
                              x
                              ¯
                            
                          
                        
                        
                          )
                          
                            2
                          
                        
                        +
                        (
                        
                          
                            
                              x
                              ¯
                            
                          
                        
                        −
                        μ
                        
                          )
                          
                            2
                          
                        
                      
                      )
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    n
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    
                      (
                      
                        n
                        s
                        +
                        n
                        (
                        
                          
                            
                              x
                              ¯
                            
                          
                        
                        −
                        μ
                        
                          )
                          
                            2
                          
                        
                      
                      )
                    
                  
                  ]
                
                ,
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\mathbf {L} (\mathbf {X} \mid \tau ,\mu )&\propto \prod _{i=1}^{n}\tau ^{1/2}\exp \left[{\frac {-\tau }{2}}(x_{i}-\mu )^{2}\right]\\[5pt]&\propto \tau ^{n/2}\exp \left[{\frac {-\tau }{2}}\sum _{i=1}^{n}(x_{i}-\mu )^{2}\right]\\[5pt]&\propto \tau ^{n/2}\exp \left[{\frac {-\tau }{2}}\sum _{i=1}^{n}(x_{i}-{\bar {x}}+{\bar {x}}-\mu )^{2}\right]\\[5pt]&\propto \tau ^{n/2}\exp \left[{\frac {-\tau }{2}}\sum _{i=1}^{n}\left((x_{i}-{\bar {x}})^{2}+({\bar {x}}-\mu )^{2}\right)\right]\\[5pt]&\propto \tau ^{n/2}\exp \left[{\frac {-\tau }{2}}\left(ns+n({\bar {x}}-\mu )^{2}\right)\right],\end{aligned}}}

<p>where 
  
    
      
        
          
            
              x
              ¯
            
          
        
        =
        
          
            1
            n
          
        
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        
          x
          
            i
          
        
      
    
    {\displaystyle {\bar {x}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}
  
, the mean of the data samples, and 
  
    
      
        s
        =
        
          
            1
            n
          
        
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        (
        
          x
          
            i
          
        
        −
        
          
            
              x
              ¯
            
          
        
        
          )
          
            2
          
        
      
    
    {\displaystyle s={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}
  
, the sample variance.
</p><p>The posterior distribution of the parameters is proportional to the prior times the likelihood.
</p>

P
                
                (
                τ
                ,
                μ
                ∣
                
                  X
                
                )
              
              
                
                ∝
                
                  L
                
                (
                
                  X
                
                ∣
                τ
                ,
                μ
                )
                π
                (
                τ
                ,
                μ
                )
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    n
                    
                      /
                    
                    2
                  
                
                exp
                ⁡
                
                  [
                  
                    
                      
                        
                          −
                          τ
                        
                        2
                      
                    
                    
                      (
                      
                        n
                        s
                        +
                        n
                        (
                        
                          
                            
                              x
                              ¯
                            
                          
                        
                        −
                        μ
                        
                          )
                          
                            2
                          
                        
                      
                      )
                    
                  
                  ]
                
                
                  τ
                  
                    
                      α
                      
                        0
                      
                    
                    −
                    
                      
                        1
                        2
                      
                    
                  
                
                
                exp
                ⁡
                [
                
                  −
                  
                    β
                    
                      0
                    
                  
                  τ
                
                ]
                
                exp
                ⁡
                
                  [
                  
                    −
                    
                      
                        
                          
                            λ
                            
                              0
                            
                          
                          τ
                          (
                          μ
                          −
                          
                            μ
                            
                              0
                            
                          
                          
                            )
                            
                              2
                            
                          
                        
                        2
                      
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    
                      
                        n
                        2
                      
                    
                    +
                    
                      α
                      
                        0
                      
                    
                    −
                    
                      
                        1
                        2
                      
                    
                  
                
                exp
                ⁡
                
                  [
                  
                    −
                    τ
                    
                      (
                      
                        
                          
                            1
                            2
                          
                        
                        n
                        s
                        +
                        
                          β
                          
                            0
                          
                        
                      
                      )
                    
                  
                  ]
                
                exp
                ⁡
                
                  [
                  
                    −
                    
                      
                        τ
                        2
                      
                    
                    
                      (
                      
                        
                          λ
                          
                            0
                          
                        
                        (
                        μ
                        −
                        
                          μ
                          
                            0
                          
                        
                        
                          )
                          
                            2
                          
                        
                        +
                        n
                        (
                        
                          
                            
                              x
                              ¯
                            
                          
                        
                        −
                        μ
                        
                          )
                          
                            2
                          
                        
                      
                      )
                    
                  
                  ]
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\mathbf {P} (\tau ,\mu \mid \mathbf {X} )&\propto \mathbf {L} (\mathbf {X} \mid \tau ,\mu )\pi (\tau ,\mu )\\&\propto \tau ^{n/2}\exp \left[{\frac {-\tau }{2}}\left(ns+n({\bar {x}}-\mu )^{2}\right)\right]\tau ^{\alpha _{0}-{\frac {1}{2}}}\,\exp[{-\beta _{0}\tau }]\,\exp \left[-{\frac {\lambda _{0}\tau (\mu -\mu _{0})^{2}}{2}}\right]\\&\propto \tau ^{{\frac {n}{2}}+\alpha _{0}-{\frac {1}{2}}}\exp \left[-\tau \left({\frac {1}{2}}ns+\beta _{0}\right)\right]\exp \left[-{\frac {\tau }{2}}\left(\lambda _{0}(\mu -\mu _{0})^{2}+n({\bar {x}}-\mu )^{2}\right)\right]\end{aligned}}}

<p>The final exponential term is simplified by completing the square.
</p>

λ
                  
                    0
                  
                
                (
                μ
                −
                
                  μ
                  
                    0
                  
                
                
                  )
                  
                    2
                  
                
                +
                n
                (
                
                  
                    
                      x
                      ¯
                    
                  
                
                −
                μ
                
                  )
                  
                    2
                  
                
              
              
                
                =
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    2
                  
                
                −
                2
                
                  λ
                  
                    0
                  
                
                μ
                
                  μ
                  
                    0
                  
                
                +
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    0
                  
                  
                    2
                  
                
                +
                n
                
                  μ
                  
                    2
                  
                
                −
                2
                n
                
                  
                    
                      x
                      ¯
                    
                  
                
                μ
                +
                n
                
                  
                    
                      
                        x
                        ¯
                      
                    
                  
                  
                    2
                  
                
              
            
            
              
              
                
                =
                (
                
                  λ
                  
                    0
                  
                
                +
                n
                )
                
                  μ
                  
                    2
                  
                
                −
                2
                (
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    0
                  
                
                +
                n
                
                  
                    
                      x
                      ¯
                    
                  
                
                )
                μ
                +
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    0
                  
                  
                    2
                  
                
                +
                n
                
                  
                    
                      
                        x
                        ¯
                      
                    
                  
                  
                    2
                  
                
              
            
            
              
              
                
                =
                (
                
                  λ
                  
                    0
                  
                
                +
                n
                )
                (
                
                  μ
                  
                    2
                  
                
                −
                2
                
                  
                    
                      
                        λ
                        
                          0
                        
                      
                      
                        μ
                        
                          0
                        
                      
                      +
                      n
                      
                        
                          
                            x
                            ¯
                          
                        
                      
                    
                    
                      
                        λ
                        
                          0
                        
                      
                      +
                      n
                    
                  
                
                μ
                )
                +
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    0
                  
                  
                    2
                  
                
                +
                n
                
                  
                    
                      
                        x
                        ¯
                      
                    
                  
                  
                    2
                  
                
              
            
            
              
              
                
                =
                (
                
                  λ
                  
                    0
                  
                
                +
                n
                )
                
                  
                    (
                    
                      μ
                      −
                      
                        
                          
                            
                              λ
                              
                                0
                              
                            
                            
                              μ
                              
                                0
                              
                            
                            +
                            n
                            
                              
                                
                                  x
                                  ¯
                                
                              
                            
                          
                          
                            
                              λ
                              
                                0
                              
                            
                            +
                            n
                          
                        
                      
                    
                    )
                  
                  
                    2
                  
                
                +
                
                  λ
                  
                    0
                  
                
                
                  μ
                  
                    0
                  
                  
                    2
                  
                
                +
                n
                
                  
                    
                      
                        x
                        ¯
                      
                    
                  
                  
                    2
                  
                
                −
                
                  
                    
                      
                        (
                        
                          
                            λ
                            
                              0
                            
                          
                          
                            μ
                            
                              0
                            
                          
                          +
                          n
                          
                            
                              
                                x
                                ¯
                              
                            
                          
                        
                        )
                      
                      
                        2
                      
                    
                    
                      
                        λ
                        
                          0
                        
                      
                      +
                      n
                    
                  
                
              
            
            
              
              
                
                =
                (
                
                  λ
                  
                    0
                  
                
                +
                n
                )
                
                  
                    (
                    
                      μ
                      −
                      
                        
                          
                            
                              λ
                              
                                0
                              
                            
                            
                              μ
                              
                                0
                              
                            
                            +
                            n
                            
                              
                                
                                  x
                                  ¯
                                
                              
                            
                          
                          
                            
                              λ
                              
                                0
                              
                            
                            +
                            n
                          
                        
                      
                    
                    )
                  
                  
                    2
                  
                
                +
                
                  
                    
                      
                        λ
                        
                          0
                        
                      
                      n
                      (
                      
                        
                          
                            x
                            ¯
                          
                        
                      
                      −
                      
                        μ
                        
                          0
                        
                      
                      
                        )
                        
                          2
                        
                      
                    
                    
                      
                        λ
                        
                          0
                        
                      
                      +
                      n
                    
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\lambda _{0}(\mu -\mu _{0})^{2}+n({\bar {x}}-\mu )^{2}&=\lambda _{0}\mu ^{2}-2\lambda _{0}\mu \mu _{0}+\lambda _{0}\mu _{0}^{2}+n\mu ^{2}-2n{\bar {x}}\mu +n{\bar {x}}^{2}\\&=(\lambda _{0}+n)\mu ^{2}-2(\lambda _{0}\mu _{0}+n{\bar {x}})\mu +\lambda _{0}\mu _{0}^{2}+n{\bar {x}}^{2}\\&=(\lambda _{0}+n)(\mu ^{2}-2{\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}}\mu )+\lambda _{0}\mu _{0}^{2}+n{\bar {x}}^{2}\\&=(\lambda _{0}+n)\left(\mu -{\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}}\right)^{2}+\lambda _{0}\mu _{0}^{2}+n{\bar {x}}^{2}-{\frac {\left(\lambda _{0}\mu _{0}+n{\bar {x}}\right)^{2}}{\lambda _{0}+n}}\\&=(\lambda _{0}+n)\left(\mu -{\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}}\right)^{2}+{\frac {\lambda _{0}n({\bar {x}}-\mu _{0})^{2}}{\lambda _{0}+n}}\end{aligned}}}

<p>On inserting this back into the expression above,
</p>

P
                
                (
                τ
                ,
                μ
                ∣
                
                  X
                
                )
              
              
                
                ∝
                
                  τ
                  
                    
                      
                        n
                        2
                      
                    
                    +
                    
                      α
                      
                        0
                      
                    
                    −
                    
                      
                        1
                        2
                      
                    
                  
                
                exp
                ⁡
                
                  [
                  
                    −
                    τ
                    
                      (
                      
                        
                          
                            1
                            2
                          
                        
                        n
                        s
                        +
                        
                          β
                          
                            0
                          
                        
                      
                      )
                    
                  
                  ]
                
                exp
                ⁡
                
                  [
                  
                    −
                    
                      
                        τ
                        2
                      
                    
                    
                      (
                      
                        
                          (
                          
                            
                              λ
                              
                                0
                              
                            
                            +
                            n
                          
                          )
                        
                        
                          
                            (
                            
                              μ
                              −
                              
                                
                                  
                                    
                                      λ
                                      
                                        0
                                      
                                    
                                    
                                      μ
                                      
                                        0
                                      
                                    
                                    +
                                    n
                                    
                                      
                                        
                                          x
                                          ¯
                                        
                                      
                                    
                                  
                                  
                                    
                                      λ
                                      
                                        0
                                      
                                    
                                    +
                                    n
                                  
                                
                              
                            
                            )
                          
                          
                            2
                          
                        
                        +
                        
                          
                            
                              
                                λ
                                
                                  0
                                
                              
                              n
                              (
                              
                                
                                  
                                    x
                                    ¯
                                  
                                
                              
                              −
                              
                                μ
                                
                                  0
                                
                              
                              
                                )
                                
                                  2
                                
                              
                            
                            
                              
                                λ
                                
                                  0
                                
                              
                              +
                              n
                            
                          
                        
                      
                      )
                    
                  
                  ]
                
              
            
            
              
              
                
                ∝
                
                  τ
                  
                    
                      
                        n
                        2
                      
                    
                    +
                    
                      α
                      
                        0
                      
                    
                    −
                    
                      
                        1
                        2
                      
                    
                  
                
                exp
                ⁡
                
                  [
                  
                    −
                    τ
                    
                      (
                      
                        
                          
                            1
                            2
                          
                        
                        n
                        s
                        +
                        
                          β
                          
                            0
                          
                        
                        +
                        
                          
                            
                              
                                λ
                                
                                  0
                                
                              
                              n
                              (
                              
                                
                                  
                                    x
                                    ¯
                                  
                                
                              
                              −
                              
                                μ
                                
                                  0
                                
                              
                              
                                )
                                
                                  2
                                
                              
                            
                            
                              2
                              (
                              
                                λ
                                
                                  0
                                
                              
                              +
                              n
                              )
                            
                          
                        
                      
                      )
                    
                  
                  ]
                
                exp
                ⁡
                
                  [
                  
                    −
                    
                      
                        τ
                        2
                      
                    
                    
                      (
                      
                        
                          λ
                          
                            0
                          
                        
                        +
                        n
                      
                      )
                    
                    
                      
                        (
                        
                          μ
                          −
                          
                            
                              
                                
                                  λ
                                  
                                    0
                                  
                                
                                
                                  μ
                                  
                                    0
                                  
                                
                                +
                                n
                                
                                  
                                    
                                      x
                                      ¯
                                    
                                  
                                
                              
                              
                                
                                  λ
                                  
                                    0
                                  
                                
                                +
                                n
                              
                            
                          
                        
                        )
                      
                      
                        2
                      
                    
                  
                  ]
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\mathbf {P} (\tau ,\mu \mid \mathbf {X} )&\propto \tau ^{{\frac {n}{2}}+\alpha _{0}-{\frac {1}{2}}}\exp \left[-\tau \left({\frac {1}{2}}ns+\beta _{0}\right)\right]\exp \left[-{\frac {\tau }{2}}\left(\left(\lambda _{0}+n\right)\left(\mu -{\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}}\right)^{2}+{\frac {\lambda _{0}n({\bar {x}}-\mu _{0})^{2}}{\lambda _{0}+n}}\right)\right]\\&\propto \tau ^{{\frac {n}{2}}+\alpha _{0}-{\frac {1}{2}}}\exp \left[-\tau \left({\frac {1}{2}}ns+\beta _{0}+{\frac {\lambda _{0}n({\bar {x}}-\mu _{0})^{2}}{2(\lambda _{0}+n)}}\right)\right]\exp \left[-{\frac {\tau }{2}}\left(\lambda _{0}+n\right)\left(\mu -{\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}}\right)^{2}\right]\end{aligned}}}

<p>This final expression is in exactly the same form as a Normal-Gamma distribution, i.e.,
</p>

P
        
        (
        τ
        ,
        μ
        ∣
        
          X
        
        )
        =
        
          NormalGamma
        
        
          (
          
            
              
                
                  
                    λ
                    
                      0
                    
                  
                  
                    μ
                    
                      0
                    
                  
                  +
                  n
                  
                    
                      
                        x
                        ¯
                      
                    
                  
                
                
                  
                    λ
                    
                      0
                    
                  
                  +
                  n
                
              
            
            ,
            
              λ
              
                0
              
            
            +
            n
            ,
            
              α
              
                0
              
            
            +
            
              
                n
                2
              
            
            ,
            
              β
              
                0
              
            
            +
            
              
                1
                2
              
            
            
              (
              
                n
                s
                +
                
                  
                    
                      
                        λ
                        
                          0
                        
                      
                      n
                      (
                      
                        
                          
                            x
                            ¯
                          
                        
                      
                      −
                      
                        μ
                        
                          0
                        
                      
                      
                        )
                        
                          2
                        
                      
                    
                    
                      
                        λ
                        
                          0
                        
                      
                      +
                      n
                    
                  
                
              
              )
            
          
          )
        
      
    
    {\displaystyle \mathbf {P} (\tau ,\mu \mid \mathbf {X} )={\text{NormalGamma}}\left({\frac {\lambda _{0}\mu _{0}+n{\bar {x}}}{\lambda _{0}+n}},\lambda _{0}+n,\alpha _{0}+{\frac {n}{2}},\beta _{0}+{\frac {1}{2}}\left(ns+{\frac {\lambda _{0}n({\bar {x}}-\mu _{0})^{2}}{\lambda _{0}+n}}\right)\right)}

<h3>Interpretation of parameters</h3>
<p>The interpretation of parameters in terms of pseudo-observations is as follows:
</p>
<ul><li>The new mean takes a weighted average of the old pseudo-mean and the observed mean, weighted by the number of associated (pseudo-)observations.</li>
<li>The precision was estimated from 
  
    
      
        2
        α
      
    
    {\displaystyle 2\alpha }
  
 pseudo-observations (i.e. possibly a different number of pseudo-observations, to allow the variance of the mean and precision to be controlled separately) with sample mean 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and sample variance 
  
    
      
        
          
            β
            α
          
        
      
    
    {\displaystyle {\frac {\beta }{\alpha }}}
  
 (i.e. with sum of <a href="/facts/Squared_deviations/qMoU6JGp">squared deviations</a> 
  
    
      
        2
        β
      
    
    {\displaystyle 2\beta }
  
).</li>
<li>The posterior updates the number of pseudo-observations (
  
    
      
        
          λ
          
            0
          
        
      
    
    {\displaystyle \lambda _{0}}
  
) simply by adding the corresponding number of new observations (
  
    
      
        n
      
    
    {\displaystyle n}
  
).</li>
<li>The new sum of squared deviations is computed by adding the previous respective sums of squared deviations.  However, a third "interaction term" is needed because the two sets of squared deviations were computed with respect to different means, and hence the sum of the two underestimates the actual total squared deviation.</li></ul>
<p>As a consequence, if one has a prior mean of 
  
    
      
        
          μ
          
            0
          
        
      
    
    {\displaystyle \mu _{0}}
  
 from 
  
    
      
        
          n
          
            μ
          
        
      
    
    {\displaystyle n_{\mu }}
  
 samples and a prior precision of 
  
    
      
        
          τ
          
            0
          
        
      
    
    {\displaystyle \tau _{0}}
  
 from 
  
    
      
        
          n
          
            τ
          
        
      
    
    {\displaystyle n_{\tau }}
  
 samples, the prior distribution over 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
 is
</p>

P
        
        (
        τ
        ,
        μ
        ∣
        
          X
        
        )
        =
        NormalGamma
        ⁡
        
          (
          
            
              μ
              
                0
              
            
            ,
            
              n
              
                μ
              
            
            ,
            
              
                
                  n
                  
                    τ
                  
                
                2
              
            
            ,
            
              
                
                  n
                  
                    τ
                  
                
                
                  2
                  
                    τ
                    
                      0
                    
                  
                
              
            
          
          )
        
      
    
    {\displaystyle \mathbf {P} (\tau ,\mu \mid \mathbf {X} )=\operatorname {NormalGamma} \left(\mu _{0},n_{\mu },{\frac {n_{\tau }}{2}},{\frac {n_{\tau }}{2\tau _{0}}}\right)}

<p>and after observing 
  
    
      
        n
      
    
    {\displaystyle n}
  
 samples with mean 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and variance 
  
    
      
        s
      
    
    {\displaystyle s}
  
, the posterior probability is
</p>

P
        
        (
        τ
        ,
        μ
        ∣
        
          X
        
        )
        =
        
          NormalGamma
        
        
          (
          
            
              
                
                  
                    n
                    
                      μ
                    
                  
                  
                    μ
                    
                      0
                    
                  
                  +
                  n
                  μ
                
                
                  
                    n
                    
                      μ
                    
                  
                  +
                  n
                
              
            
            ,
            
              n
              
                μ
              
            
            +
            n
            ,
            
              
                1
                2
              
            
            (
            
              n
              
                τ
              
            
            +
            n
            )
            ,
            
              
                1
                2
              
            
            
              (
              
                
                  
                    
                      n
                      
                        τ
                      
                    
                    
                      τ
                      
                        0
                      
                    
                  
                
                +
                n
                s
                +
                
                  
                    
                      
                        n
                        
                          μ
                        
                      
                      n
                      (
                      μ
                      −
                      
                        μ
                        
                          0
                        
                      
                      
                        )
                        
                          2
                        
                      
                    
                    
                      
                        n
                        
                          μ
                        
                      
                      +
                      n
                    
                  
                
              
              )
            
          
          )
        
      
    
    {\displaystyle \mathbf {P} (\tau ,\mu \mid \mathbf {X} )={\text{NormalGamma}}\left({\frac {n_{\mu }\mu _{0}+n\mu }{n_{\mu }+n}},n_{\mu }+n,{\frac {1}{2}}(n_{\tau }+n),{\frac {1}{2}}\left({\frac {n_{\tau }}{\tau _{0}}}+ns+{\frac {n_{\mu }n(\mu -\mu _{0})^{2}}{n_{\mu }+n}}\right)\right)}

<p>Note that in some programming languages, such as <a href="/facts/Matlab/qPjLISCk">Matlab</a>, the gamma distribution is implemented with the inverse definition of 
  
    
      
        β
      
    
    {\displaystyle \beta }
  
, so the fourth argument of the Normal-Gamma distribution is 
  
    
      
        2
        
          τ
          
            0
          
        
        
          /
        
        
          n
          
            τ
          
        
      
    
    {\displaystyle 2\tau _{0}/n_{\tau }}
  
.
</p>
<h2 id="generating-normal-gamma-random-variates">Generating normal-gamma random variates</h2>
<p>Generation of random variates is straightforward:
</p>
<ol><li>Sample 
  
    
      
        τ
      
    
    {\displaystyle \tau }
  
 from a gamma distribution with parameters 
  
    
      
        α
      
    
    {\displaystyle \alpha }
  
 and 
  
    
      
        β
      
    
    {\displaystyle \beta }
  
</li>
<li>Sample 
  
    
      
        x
      
    
    {\displaystyle x}
  
 from a normal distribution with mean 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 and variance 
  
    
      
        1
        
          /
        
        (
        λ
        τ
        )
      
    
    {\displaystyle 1/(\lambda \tau )}
  
</li></ol>
<h2 id="related-distributions">Related distributions</h2>
<ul><li>The <a href="/facts/Normal-inverse-gamma_distribution/2koIxNwD">normal-inverse-gamma distribution</a> is the same distribution parameterized by variance rather than precision</li>
<li>The <a href="/facts/Normal-exponential-gamma_distribution/Cpvo1zmz">normal-exponential-gamma distribution</a></li></ul>
<h2 id="notes">Notes</h2>

<ul><li>Bernardo, J.M.; Smith, A.F.M. (1993) <i>Bayesian Theory</i>, Wiley. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-471-49464-X</li>
<li>Dearden et al. <a href="http://www.aaai.org/Papers/AAAI/1998/AAAI98-108.pdf">"Bayesian Q-learning"</a>, <i>Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)</i>, July 26–30, 1998, Madison, Wisconsin, USA.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Bernardo & Smith (1993, pages 136, 268, 434) <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Wasserman, Larry (2004), "Parametric Inference", Springer Texts in Statistics, New York, NY: Springer New York, pp. 119–148, ISBN 978-1-4419-2322-6, retrieved 2023-12-08 <a href="978-1-4419-2322-6" target="_blank">978-1-4419-2322-6</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>"Bayes' Theorem: Introduction". Archived from the original on 2014-08-07. Retrieved 2014-08-05. <a href="http://www.trinity.edu/cbrown/bayesweb/" target="_blank">http://www.trinity.edu/cbrown/bayesweb/</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
</ol>

Normal-gamma distribution open-in-new

Normal-gamma distribution