The Dirichlet distribution is a conjugate distribution to the negative multinomial distribution. This fact leads to an analytically tractable compound distribution. For a random vector of category counts x = ( x 1 , … , x m ) {\displaystyle \mathbf {x} =(x_{1},\dots ,x_{m})} , distributed according to a negative multinomial distribution, the compound distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution:
which results in the following formula:
where x + {\displaystyle \mathbf {x_{+}} } and α + {\displaystyle {\boldsymbol {\alpha }}_{+}} are the m + 1 {\displaystyle m+1} dimensional vectors created by appending the scalars x 0 {\displaystyle x_{0}} and α 0 {\displaystyle \alpha _{0}} to the m {\displaystyle m} dimensional vectors x {\displaystyle \mathbf {x} } and α {\displaystyle {\boldsymbol {\alpha }}} respectively and B {\displaystyle \mathrm {B} } is the multivariate version of the beta function. We can write this equation explicitly as
Alternative formulations exist. One convenient representation1 is
where x ∙ = x 0 + x 1 + ⋯ + x m {\displaystyle x_{\bullet }=x_{0}+x_{1}+\cdots +x_{m}} and α ∙ = α 0 + α 1 + ⋯ + α m {\displaystyle \alpha _{\bullet }=\alpha _{0}+\alpha _{1}+\cdots +\alpha _{m}} .
This can also be written
To obtain the marginal distribution over a subset of Dirichlet negative multinomial random variables, one only needs to drop the irrelevant α i {\displaystyle \alpha _{i}} 's (the variables that one wants to marginalize out) from the α {\displaystyle {\boldsymbol {\alpha }}} vector. The joint distribution of the remaining random variates is D N M ( x 0 , α 0 , α ( − ) ) {\displaystyle \mathrm {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha _{(-)}}})} where α ( − ) {\displaystyle {\boldsymbol {\alpha _{(-)}}}} is the vector with the removed α i {\displaystyle \alpha _{i}} 's. The univariate marginals are said to be beta negative binomially distributed.
If m-dimensional x is partitioned as follows
and accordingly α {\displaystyle {\boldsymbol {\alpha }}}
then the conditional distribution of X ( 1 ) {\displaystyle \mathbf {X} ^{(1)}} on X ( 2 ) = x ( 2 ) {\displaystyle \mathbf {X} ^{(2)}=\mathbf {x} ^{(2)}} is D N M ( x 0 ′ , α 0 ′ , α ( 1 ) ) {\displaystyle \mathrm {DNM} (x_{0}^{\prime },\alpha _{0}^{\prime },{\boldsymbol {\alpha }}^{(1)})} where
and
That is,
The conditional distribution of a Dirichlet negative multinomial distribution on ∑ i = 1 m x i = n {\displaystyle \sum _{i=1}^{m}x_{i}=n} is Dirichlet-multinomial distribution with parameters n {\displaystyle n} and α {\displaystyle {\boldsymbol {\alpha }}} . That is
Notice that the expression does not depend on x 0 {\displaystyle x_{0}} or α 0 {\displaystyle \alpha _{0}} .
If
then, if the random variables with positive subscripts i and j are dropped from the vector and replaced by their sum,
For α 0 > 2 {\displaystyle \alpha _{0}>2} the entries of the correlation matrix are
The Dirichlet negative multinomial is a heavy tailed distribution. It does not have a finite mean for α 0 ≤ 1 {\displaystyle \alpha _{0}\leq 1} and it has infinite covariance matrix for α 0 ≤ 2 {\displaystyle \alpha _{0}\leq 2} . Therefore the moment generating function does not exist.
In the case when the m + 2 {\displaystyle m+2} parameters x 0 , α 0 {\displaystyle x_{0},\alpha _{0}} and α {\displaystyle {\boldsymbol {\alpha }}} are positive integers the Dirichlet negative multinomial can also be motivated by an urn model - or more specifically a basic Pólya urn model. Consider an urn initially containing ∑ i = 0 m α i {\displaystyle \sum _{i=0}^{m}{\alpha _{i}}} balls of m + 1 {\displaystyle m+1} various colors including α 0 {\displaystyle \alpha _{0}} red balls (the stopping color). The vector α {\displaystyle {\boldsymbol {\alpha }}} gives the respective counts of the other balls of various m {\displaystyle m} non-red colors. At each step of the model, a ball is drawn at random from the urn and replaced, along with one additional ball of the same color. The process is repeated over and over, until x 0 {\displaystyle x_{0}} red colored balls are drawn. The random vector X {\displaystyle \mathbf {X} } of observed draws of the other m {\displaystyle m} non-red colors are distributed according to a D N M ( x 0 , α 0 , α ) {\displaystyle \mathrm {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha }})} . Note, at the end of the experiment, the urn always contains the fixed number x 0 + α 0 {\displaystyle x_{0}+\alpha _{0}} of red balls while containing the random number X + α {\displaystyle \mathbf {X} +{\boldsymbol {\alpha }}} of the other m {\displaystyle m} colors.
Farewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050. ↩