Likelihood p ( x i | θ ) {\displaystyle p(x_{i}|\theta )} | Model parameters θ {\displaystyle \theta } | Conjugate prior (and posterior) distribution p ( θ | Θ ) , p ( θ | x , Θ ) = p ( θ | Θ ′ ) {\displaystyle p(\theta |\Theta ),p(\theta |\mathbf {x} ,\Theta )=p(\theta |\Theta ')} | Prior hyperparameters Θ {\displaystyle \Theta } | Posterior hyperparameters Θ ′ {\displaystyle \Theta '} | Interpretation of hyperparameters | Posterior predictive p ( x ~ | x , Θ ) = p ( x ~ | Θ ′ ) {\displaystyle p({\tilde {x}}|\mathbf {x} ,\Theta )=p({\tilde {x}}|\Theta ')} |
---|
Normalwith known variance σ2 | μ (mean) | Normal | μ 0 , σ 0 2 {\displaystyle \mu _{0},\,\sigma _{0}^{2}\!} | 1 1 σ 0 2 + n σ 2 ( μ 0 σ 0 2 + ∑ i = 1 n x i σ 2 ) , ( 1 σ 0 2 + n σ 2 ) − 1 {\displaystyle {\frac {1}{{\frac {1}{\sigma _{0}^{2}}}+{\frac {n}{\sigma ^{2}}}}}\left({\frac {\mu _{0}}{\sigma _{0}^{2}}}+{\frac {\sum _{i=1}^{n}x_{i}}{\sigma ^{2}}}\right),\left({\frac {1}{\sigma _{0}^{2}}}+{\frac {n}{\sigma ^{2}}}\right)^{-1}} | mean was estimated from observations with total precision (sum of all individual precisions) 1 / σ 0 2 {\displaystyle 1/\sigma _{0}^{2}} and with sample mean μ 0 {\displaystyle \mu _{0}} | N ( x ~ | μ 0 ′ , σ 0 2 ′ + σ 2 ) {\displaystyle {\mathcal {N}}({\tilde {x}}|\mu _{0}',{\sigma _{0}^{2}}'+\sigma ^{2})} |
Normalwith known precision τ | μ (mean) | Normal | μ 0 , τ 0 − 1 {\displaystyle \mu _{0},\,\tau _{0}^{-1}\!} | τ 0 μ 0 + τ ∑ i = 1 n x i τ 0 + n τ , ( τ 0 + n τ ) − 1 {\displaystyle {\frac {\tau _{0}\mu _{0}+\tau \sum _{i=1}^{n}x_{i}}{\tau _{0}+n\tau }},\,\left(\tau _{0}+n\tau \right)^{-1}} | mean was estimated from observations with total precision (sum of all individual precisions) τ 0 {\displaystyle \tau _{0}} and with sample mean μ 0 {\displaystyle \mu _{0}} | N ( x ~ ∣ μ 0 ′ , 1 τ 0 ′ + 1 τ ) {\displaystyle {\mathcal {N}}\left({\tilde {x}}\mid \mu _{0}',{\frac {1}{\tau _{0}'}}+{\frac {1}{\tau }}\right)} |
Normalwith known mean μ | σ2 (variance) | Inverse gamma | α , β {\displaystyle \mathbf {\alpha ,\,\beta } } | α + n 2 , β + ∑ i = 1 n ( x i − μ ) 2 2 {\displaystyle \mathbf {\alpha } +{\frac {n}{2}},\,\mathbf {\beta } +{\frac {\sum _{i=1}^{n}{(x_{i}-\mu )^{2}}}{2}}} | variance was estimated from 2 α {\displaystyle 2\alpha } observations with sample variance β / α {\displaystyle \beta /\alpha } (i.e. with sum of squared deviations 2 β {\displaystyle 2\beta } , where deviations are from known mean μ {\displaystyle \mu } ) | t 2 α ′ ( x ~ | μ , σ 2 = β ′ / α ′ ) {\displaystyle t_{2\alpha '}({\tilde {x}}|\mu ,\sigma ^{2}=\beta '/\alpha ')} |
Normalwith known mean μ | σ2 (variance) | Scaled inverse chi-squared | ν , σ 0 2 {\displaystyle \nu ,\,\sigma _{0}^{2}\!} | ν + n , ν σ 0 2 + ∑ i = 1 n ( x i − μ ) 2 ν + n {\displaystyle \nu +n,\,{\frac {\nu \sigma _{0}^{2}+\sum _{i=1}^{n}(x_{i}-\mu )^{2}}{\nu +n}}\!} | variance was estimated from ν {\displaystyle \nu } observations with sample variance σ 0 2 {\displaystyle \sigma _{0}^{2}} | t ν ′ ( x ~ | μ , σ 0 2 ′ ) {\displaystyle t_{\nu '}({\tilde {x}}|\mu ,{\sigma _{0}^{2}}')} |
Normalwith known mean μ | τ (precision) | Gamma | α , β {\displaystyle \alpha ,\,\beta \!} | α + n 2 , β + ∑ i = 1 n ( x i − μ ) 2 2 {\displaystyle \alpha +{\frac {n}{2}},\,\beta +{\frac {\sum _{i=1}^{n}(x_{i}-\mu )^{2}}{2}}\!} | precision was estimated from 2 α {\displaystyle 2\alpha } observations with sample variance β / α {\displaystyle \beta /\alpha } (i.e. with sum of squared deviations 2 β {\displaystyle 2\beta } , where deviations are from known mean μ {\displaystyle \mu } ) | t 2 α ′ ( x ~ ∣ μ , σ 2 = β ′ / α ′ ) {\displaystyle t_{2\alpha '}({\tilde {x}}\mid \mu ,\sigma ^{2}=\beta '/\alpha ')} |
Normal | μ and σ2Assuming exchangeability | Normal-inverse gamma | μ 0 , ν , α , β {\displaystyle \mu _{0},\,\nu ,\,\alpha ,\,\beta } | ν μ 0 + n x ¯ ν + n , ν + n , α + n 2 , {\displaystyle {\frac {\nu \mu _{0}+n{\bar {x}}}{\nu +n}},\,\nu +n,\,\alpha +{\frac {n}{2}},\,} β + 1 2 ∑ i = 1 n ( x i − x ¯ ) 2 + n ν ν + n ( x ¯ − μ 0 ) 2 2 {\displaystyle \beta +{\tfrac {1}{2}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}+{\frac {n\nu }{\nu +n}}{\frac {({\bar {x}}-\mu _{0})^{2}}{2}}} - x ¯ {\displaystyle {\bar {x}}} is the sample mean
| mean was estimated from ν {\displaystyle \nu } observations with sample mean μ 0 {\displaystyle \mu _{0}} ; variance was estimated from 2 α {\displaystyle 2\alpha } observations with sample mean μ 0 {\displaystyle \mu _{0}} and sum of squared deviations 2 β {\displaystyle 2\beta } | t 2 α ′ ( x ~ ∣ μ ′ , β ′ ( ν ′ + 1 ) ν ′ α ′ ) {\displaystyle t_{2\alpha '}\left({\tilde {x}}\mid \mu ',{\frac {\beta '(\nu '+1)}{\nu '\alpha '}}\right)} |
Normal | μ and τAssuming exchangeability | Normal-gamma | μ 0 , ν , α , β {\displaystyle \mu _{0},\,\nu ,\,\alpha ,\,\beta } | ν μ 0 + n x ¯ ν + n , ν + n , α + n 2 , {\displaystyle {\frac {\nu \mu _{0}+n{\bar {x}}}{\nu +n}},\,\nu +n,\,\alpha +{\frac {n}{2}},\,} β + 1 2 ∑ i = 1 n ( x i − x ¯ ) 2 + n ν ν + n ( x ¯ − μ 0 ) 2 2 {\displaystyle \beta +{\tfrac {1}{2}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}+{\frac {n\nu }{\nu +n}}{\frac {({\bar {x}}-\mu _{0})^{2}}{2}}} - x ¯ {\displaystyle {\bar {x}}} is the sample mean
| mean was estimated from ν {\displaystyle \nu } observations with sample mean μ 0 {\displaystyle \mu _{0}} , and precision was estimated from 2 α {\displaystyle 2\alpha } observations with sample mean μ 0 {\displaystyle \mu _{0}} and sum of squared deviations 2 β {\displaystyle 2\beta } | t 2 α ′ ( x ~ ∣ μ ′ , β ′ ( ν ′ + 1 ) α ′ ν ′ ) {\displaystyle t_{2\alpha '}\left({\tilde {x}}\mid \mu ',{\frac {\beta '(\nu '+1)}{\alpha '\nu '}}\right)} |
Multivariate normal with known covariance matrix Σ | μ (mean vector) | Multivariate normal | μ 0 , Σ 0 {\displaystyle {\boldsymbol {\boldsymbol {\mu }}}_{0},\,{\boldsymbol {\Sigma }}_{0}} | ( Σ 0 − 1 + n Σ − 1 ) − 1 ( Σ 0 − 1 μ 0 + n Σ − 1 x ¯ ) , {\displaystyle \left({\boldsymbol {\Sigma }}_{0}^{-1}+n{\boldsymbol {\Sigma }}^{-1}\right)^{-1}\left({\boldsymbol {\Sigma }}_{0}^{-1}{\boldsymbol {\mu }}_{0}+n{\boldsymbol {\Sigma }}^{-1}\mathbf {\bar {x}} \right),} ( Σ 0 − 1 + n Σ − 1 ) − 1 {\displaystyle \left({\boldsymbol {\Sigma }}_{0}^{-1}+n{\boldsymbol {\Sigma }}^{-1}\right)^{-1}} - x ¯ {\displaystyle \mathbf {\bar {x}} } is the sample mean
| mean was estimated from observations with total precision (sum of all individual precisions) Σ 0 − 1 {\displaystyle {\boldsymbol {\Sigma }}_{0}^{-1}} and with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} | N ( x ~ ∣ μ 0 ′ , Σ 0 ′ + Σ ) {\displaystyle {\mathcal {N}}({\tilde {\mathbf {x} }}\mid {{\boldsymbol {\mu }}_{0}}',{{\boldsymbol {\Sigma }}_{0}}'+{\boldsymbol {\Sigma }})} |
Multivariate normal with known precision matrix Λ | μ (mean vector) | Multivariate normal | μ 0 , Λ 0 {\displaystyle \mathbf {\boldsymbol {\mu }} _{0},\,{\boldsymbol {\Lambda }}_{0}} | ( Λ 0 + n Λ ) − 1 ( Λ 0 μ 0 + n Λ x ¯ ) , ( Λ 0 + n Λ ) {\displaystyle \left({\boldsymbol {\Lambda }}_{0}+n{\boldsymbol {\Lambda }}\right)^{-1}\left({\boldsymbol {\Lambda }}_{0}{\boldsymbol {\mu }}_{0}+n{\boldsymbol {\Lambda }}\mathbf {\bar {x}} \right),\,\left({\boldsymbol {\Lambda }}_{0}+n{\boldsymbol {\Lambda }}\right)} - x ¯ {\displaystyle \mathbf {\bar {x}} } is the sample mean
| mean was estimated from observations with total precision (sum of all individual precisions) Λ 0 {\displaystyle {\boldsymbol {\Lambda }}_{0}} and with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} | N ( x ~ ∣ μ 0 ′ , Λ 0 ′ − 1 + Λ − 1 ) {\displaystyle {\mathcal {N}}\left({\tilde {\mathbf {x} }}\mid {{\boldsymbol {\mu }}_{0}}',{{{\boldsymbol {\Lambda }}_{0}}'}^{-1}+{\boldsymbol {\Lambda }}^{-1}\right)} |
Multivariate normal with known mean μ | Σ (covariance matrix) | Inverse-Wishart | ν , Ψ {\displaystyle \nu ,\,{\boldsymbol {\Psi }}} | n + ν , Ψ + ∑ i = 1 n ( x i − μ ) ( x i − μ ) T {\displaystyle n+\nu ,\,{\boldsymbol {\Psi }}+\sum _{i=1}^{n}(\mathbf {x_{i}} -{\boldsymbol {\mu }})(\mathbf {x_{i}} -{\boldsymbol {\mu }})^{T}} | covariance matrix was estimated from ν {\displaystyle \nu } observations with sum of pairwise deviation products Ψ {\displaystyle {\boldsymbol {\Psi }}} | t ν ′ − p + 1 ( x ~ | μ , 1 ν ′ − p + 1 Ψ ′ ) {\displaystyle t_{\nu '-p+1}\left({\tilde {\mathbf {x} }}|{\boldsymbol {\mu }},{\frac {1}{\nu '-p+1}}{\boldsymbol {\Psi }}'\right)} |
Multivariate normal with known mean μ | Λ (precision matrix) | Wishart | ν , V {\displaystyle \nu ,\,\mathbf {V} } | n + ν , ( V − 1 + ∑ i = 1 n ( x i − μ ) ( x i − μ ) T ) − 1 {\displaystyle n+\nu ,\,\left(\mathbf {V} ^{-1}+\sum _{i=1}^{n}(\mathbf {x_{i}} -{\boldsymbol {\mu }})(\mathbf {x_{i}} -{\boldsymbol {\mu }})^{T}\right)^{-1}} | covariance matrix was estimated from ν {\displaystyle \nu } observations with sum of pairwise deviation products V − 1 {\displaystyle \mathbf {V} ^{-1}} | t ν ′ − p + 1 ( x ~ ∣ μ , 1 ν ′ − p + 1 V ′ − 1 ) {\displaystyle t_{\nu '-p+1}\left({\tilde {\mathbf {x} }}\mid {\boldsymbol {\mu }},{\frac {1}{\nu '-p+1}}{\mathbf {V} '}^{-1}\right)} |
Multivariate normal | μ (mean vector) and Σ (covariance matrix) | normal-inverse-Wishart | μ 0 , κ 0 , ν 0 , Ψ {\displaystyle {\boldsymbol {\mu }}_{0},\,\kappa _{0},\,\nu _{0},\,{\boldsymbol {\Psi }}} | κ 0 μ 0 + n x ¯ κ 0 + n , κ 0 + n , ν 0 + n , {\displaystyle {\frac {\kappa _{0}{\boldsymbol {\mu }}_{0}+n\mathbf {\bar {x}} }{\kappa _{0}+n}},\,\kappa _{0}+n,\,\nu _{0}+n,\,} Ψ + C + κ 0 n κ 0 + n ( x ¯ − μ 0 ) ( x ¯ − μ 0 ) T {\displaystyle {\boldsymbol {\Psi }}+\mathbf {C} +{\frac {\kappa _{0}n}{\kappa _{0}+n}}(\mathbf {\bar {x}} -{\boldsymbol {\mu }}_{0})(\mathbf {\bar {x}} -{\boldsymbol {\mu }}_{0})^{T}} - x ¯ {\displaystyle \mathbf {\bar {x}} } is the sample mean
- C = ∑ i = 1 n ( x i − x ¯ ) ( x i − x ¯ ) T {\displaystyle \mathbf {C} =\sum _{i=1}^{n}(\mathbf {x_{i}} -\mathbf {\bar {x}} )(\mathbf {x_{i}} -\mathbf {\bar {x}} )^{T}}
| mean was estimated from κ 0 {\displaystyle \kappa _{0}} observations with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} ; covariance matrix was estimated from ν 0 {\displaystyle \nu _{0}} observations with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and with sum of pairwise deviation products Ψ = ν 0 Σ 0 {\displaystyle {\boldsymbol {\Psi }}=\nu _{0}{\boldsymbol {\Sigma }}_{0}} | t ν 0 ′ − p + 1 ( x ~ | μ 0 ′ , κ 0 ′ + 1 κ 0 ′ ( ν 0 ′ − p + 1 ) Ψ ′ ) {\displaystyle t_{{\nu _{0}}'-p+1}\left({\tilde {\mathbf {x} }}|{{\boldsymbol {\mu }}_{0}}',{\frac {{\kappa _{0}}'+1}{{\kappa _{0}}'({\nu _{0}}'-p+1)}}{\boldsymbol {\Psi }}'\right)} |
Multivariate normal | μ (mean vector) and Λ (precision matrix) | normal-Wishart | μ 0 , κ 0 , ν 0 , V {\displaystyle {\boldsymbol {\mu }}_{0},\,\kappa _{0},\,\nu _{0},\,\mathbf {V} } | κ 0 μ 0 + n x ¯ κ 0 + n , κ 0 + n , ν 0 + n , {\displaystyle {\frac {\kappa _{0}{\boldsymbol {\mu }}_{0}+n\mathbf {\bar {x}} }{\kappa _{0}+n}},\,\kappa _{0}+n,\,\nu _{0}+n,\,} ( V − 1 + C + κ 0 n κ 0 + n ( x ¯ − μ 0 ) ( x ¯ − μ 0 ) T ) − 1 {\displaystyle \left(\mathbf {V} ^{-1}+\mathbf {C} +{\frac {\kappa _{0}n}{\kappa _{0}+n}}(\mathbf {\bar {x}} -{\boldsymbol {\mu }}_{0})(\mathbf {\bar {x}} -{\boldsymbol {\mu }}_{0})^{T}\right)^{-1}} - x ¯ {\displaystyle \mathbf {\bar {x}} } is the sample mean
- C = ∑ i = 1 n ( x i − x ¯ ) ( x i − x ¯ ) T {\displaystyle \mathbf {C} =\sum _{i=1}^{n}(\mathbf {x_{i}} -\mathbf {\bar {x}} )(\mathbf {x_{i}} -\mathbf {\bar {x}} )^{T}}
| mean was estimated from κ 0 {\displaystyle \kappa _{0}} observations with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} ; covariance matrix was estimated from ν 0 {\displaystyle \nu _{0}} observations with sample mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and with sum of pairwise deviation products V − 1 {\displaystyle \mathbf {V} ^{-1}} | t ν 0 ′ − p + 1 ( x ~ ∣ μ 0 ′ , κ 0 ′ + 1 κ 0 ′ ( ν 0 ′ − p + 1 ) V ′ − 1 ) {\displaystyle t_{{\nu _{0}}'-p+1}\left({\tilde {\mathbf {x} }}\mid {{\boldsymbol {\mu }}_{0}}',{\frac {{\kappa _{0}}'+1}{{\kappa _{0}}'({\nu _{0}}'-p+1)}}{\mathbf {V} '}^{-1}\right)} |
Uniform | U ( 0 , θ ) {\displaystyle U(0,\theta )\!} | Pareto | x m , k {\displaystyle x_{m},\,k\!} | max { x 1 , … , x n , x m } , k + n {\displaystyle \max\{\,x_{1},\ldots ,x_{n},x_{\mathrm {m} }\},\,k+n\!} | k {\displaystyle k} observations with maximum value x m {\displaystyle x_{m}} | |
Pareto with known minimum xm | k (shape) | Gamma | α , β {\displaystyle \alpha ,\,\beta \!} | α + n , β + ∑ i = 1 n ln x i x m {\displaystyle \alpha +n,\,\beta +\sum _{i=1}^{n}\ln {\frac {x_{i}}{x_{\mathrm {m} }}}\!} | α {\displaystyle \alpha } observations with sum β {\displaystyle \beta } of the order of magnitude of each observation (i.e. the logarithm of the ratio of each observation to the minimum x m {\displaystyle x_{m}} ) | |
Weibull with known shape β | θ (scale) | Inverse gamma | a , b {\displaystyle a,b\!} | a + n , b + ∑ i = 1 n x i β {\displaystyle a+n,\,b+\sum _{i=1}^{n}x_{i}^{\beta }\!} | a {\displaystyle a} observations with sum b {\displaystyle b} of the β'th power of each observation | |
Log-normal | Same as for the normal distribution after applying the natural logarithm to the data for the posterior hyperparameters. Please refer to Fink (1997, pp. 21–22) to see the details. |
Exponential | λ (rate) | Gamma | α , β {\displaystyle \alpha ,\,\beta \!} | α + n , β + ∑ i = 1 n x i {\displaystyle \alpha +n,\,\beta +\sum _{i=1}^{n}x_{i}\!} | α {\displaystyle \alpha } observations that sum to β {\displaystyle \beta } | Lomax ( x ~ ∣ β ′ , α ′ ) {\displaystyle \operatorname {Lomax} ({\tilde {x}}\mid \beta ',\alpha ')} (Lomax distribution) |
Gamma with known shape α | β (rate) | Gamma | α 0 , β 0 {\displaystyle \alpha _{0},\,\beta _{0}\!} | α 0 + n α , β 0 + ∑ i = 1 n x i {\displaystyle \alpha _{0}+n\alpha ,\,\beta _{0}+\sum _{i=1}^{n}x_{i}\!} | α 0 / α {\displaystyle \alpha _{0}/\alpha } observations with sum β 0 {\displaystyle \beta _{0}} | CG ( x ~ ∣ α , α 0 ′ , β 0 ′ ) = β ′ ( x ~ | α , α 0 ′ , 1 , β 0 ′ ) {\displaystyle \operatorname {CG} ({\tilde {\mathbf {x} }}\mid \alpha ,{\alpha _{0}}',{\beta _{0}}')=\operatorname {\beta '} ({\tilde {\mathbf {x} }}|\alpha ,{\alpha _{0}}',1,{\beta _{0}}')} |
Inverse Gamma with known shape α | β (inverse scale) | Gamma | α 0 , β 0 {\displaystyle \alpha _{0},\,\beta _{0}\!} | α 0 + n α , β 0 + ∑ i = 1 n 1 x i {\displaystyle \alpha _{0}+n\alpha ,\,\beta _{0}+\sum _{i=1}^{n}{\frac {1}{x_{i}}}\!} | α 0 / α {\displaystyle \alpha _{0}/\alpha } observations with sum β 0 {\displaystyle \beta _{0}} | |
Gamma with known rate β | α (shape) | ∝ a α − 1 β α c Γ ( α ) b {\displaystyle \propto {\frac {a^{\alpha -1}\beta ^{\alpha c}}{\Gamma (\alpha )^{b}}}} | a , b , c {\displaystyle a,\,b,\,c\!} | a ∏ i = 1 n x i , b + n , c + n {\displaystyle a\prod _{i=1}^{n}x_{i},\,b+n,\,c+n\!} | b {\displaystyle b} or c {\displaystyle c} observations ( b {\displaystyle b} for estimating α {\displaystyle \alpha } , c {\displaystyle c} for estimating β {\displaystyle \beta } ) with product a {\displaystyle a} | |
Gamma | α (shape), β (inverse scale) | ∝ p α − 1 e − β q Γ ( α ) r β − α s {\displaystyle \propto {\frac {p^{\alpha -1}e^{-\beta q}}{\Gamma (\alpha )^{r}\beta ^{-\alpha s}}}} | p , q , r , s {\displaystyle p,\,q,\,r,\,s\!} | p ∏ i = 1 n x i , q + ∑ i = 1 n x i , r + n , s + n {\displaystyle p\prod _{i=1}^{n}x_{i},\,q+\sum _{i=1}^{n}x_{i},\,r+n,\,s+n\!} | α {\displaystyle \alpha } was estimated from r {\displaystyle r} observations with product p {\displaystyle p} ; β {\displaystyle \beta } was estimated from s {\displaystyle s} observations with sum q {\displaystyle q} | |
Beta | α, β | ∝ Γ ( α + β ) k p α q β Γ ( α ) k Γ ( β ) k {\displaystyle \propto {\frac {\Gamma (\alpha +\beta )^{k}\,p^{\alpha }\,q^{\beta }}{\Gamma (\alpha )^{k}\,\Gamma (\beta )^{k}}}} | p , q , k {\displaystyle p,\,q,\,k\!} | p ∏ i = 1 n x i , q ∏ i = 1 n ( 1 − x i ) , k + n {\displaystyle p\prod _{i=1}^{n}x_{i},\,q\prod _{i=1}^{n}(1-x_{i}),\,k+n\!} | α {\displaystyle \alpha } and β {\displaystyle \beta } were estimated from k {\displaystyle k} observations with product p {\displaystyle p} and product of the complements q {\displaystyle q} | |