Multivariate probit model

<h2 id="example-bivariate-probit">Example: bivariate probit</h2>
In the ordinary probit model, there is only one binary dependent variable 
 
 
 
 Y
 
 
 {\displaystyle Y}
 
 and so only one <a href="/facts/Latent_variable/ohg99BnW">latent variable</a> 
 
 
 
 
 Y
 
 ∗
 
 
 
 
 {\displaystyle Y^{*}}
 
 is used. In contrast, in the bivariate probit model there are two binary dependent variables 
 
 
 
 
 Y
 
 1
 
 
 
 
 {\displaystyle Y_{1}}
 
 and 
 
 
 
 
 Y
 
 2
 
 
 
 
 {\displaystyle Y_{2}}
 
, so there are two latent variables: 
 
 
 
 
 Y
 
 1
 
 
 ∗
 
 
 
 
 {\displaystyle Y_{1}^{*}}
 
 and 
 
 
 
 
 Y
 
 2
 
 
 ∗
 
 
 
 
 {\displaystyle Y_{2}^{*}}
 
.
It is assumed that each observed variable takes on the value 1 if and only if its underlying continuous latent variable takes on a positive value:

Y
          
            1
          
        
        =
        
          
            {
            
              
                
                  1
                
                
                  
                    if 
                  
                  
                    Y
                    
                      1
                    
                    
                      ∗
                    
                  
                  >
                  0
                  ,
                
              
              
                
                  0
                
                
                  
                    otherwise
                  
                  ,
                
              
            
            
          
        
      
    
    {\displaystyle Y_{1}={\begin{cases}1&{\text{if }}Y_{1}^{*}>0,\\0&{\text{otherwise}},\end{cases}}}

Y
          
            2
          
        
        =
        
          
            {
            
              
                
                  1
                
                
                  
                    if 
                  
                  
                    Y
                    
                      2
                    
                    
                      ∗
                    
                  
                  >
                  0
                  ,
                
              
              
                
                  0
                
                
                  
                    otherwise
                  
                  ,
                
              
            
            
          
        
      
    
    {\displaystyle Y_{2}={\begin{cases}1&{\text{if }}Y_{2}^{*}>0,\\0&{\text{otherwise}},\end{cases}}}

with

{
            
              
                
                  
                    Y
                    
                      1
                    
                    
                      ∗
                    
                  
                  =
                  
                    X
                    
                      1
                    
                  
                  
                    β
                    
                      1
                    
                  
                  +
                  
                    ε
                    
                      1
                    
                  
                
              
              
                
                  
                    Y
                    
                      2
                    
                    
                      ∗
                    
                  
                  =
                  
                    X
                    
                      2
                    
                  
                  
                    β
                    
                      2
                    
                  
                  +
                  
                    ε
                    
                      2
                    
                  
                
              
            
            
          
        
      
    
    {\displaystyle {\begin{cases}Y_{1}^{*}=X_{1}\beta _{1}+\varepsilon _{1}\\Y_{2}^{*}=X_{2}\beta _{2}+\varepsilon _{2}\end{cases}}}

and

[
            
              
                
                  
                    ε
                    
                      1
                    
                  
                
              
              
                
                  
                    ε
                    
                      2
                    
                  
                
              
            
            ]
          
        
        ∣
        X
        ∼
        
          
            N
          
        
        
          (
          
            
              
                [
                
                  
                    
                      0
                    
                  
                  
                    
                      0
                    
                  
                
                ]
              
            
            ,
            
              
                [
                
                  
                    
                      1
                    
                    
                      ρ
                    
                  
                  
                    
                      ρ
                    
                    
                      1
                    
                  
                
                ]
              
            
          
          )
        
      
    
    {\displaystyle {\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\end{bmatrix}}\mid X\sim {\mathcal {N}}\left({\begin{bmatrix}0\\0\end{bmatrix}},{\begin{bmatrix}1&\rho \\\rho &1\end{bmatrix}}\right)}

Fitting the bivariate probit model involves estimating the values of 
 
 
 
 
 β
 
 1
 
 
 ,
  
 
 β
 
 2
 
 
 ,
 
 
 {\displaystyle \beta _{1},\ \beta _{2},}
 
 and 
 
 
 
 ρ
 
 
 {\displaystyle \rho }
 
. To do so, the <a href="/facts/Maximum_likelihood/0Yq2dpQD">likelihood of the model has to be maximized</a>. This likelihood is

L
                (
                
                  β
                  
                    1
                  
                
                ,
                
                  β
                  
                    2
                  
                
                )
                =
                
                  
                    (
                  
                
                ∏
              
              
                P
                (
                
                  Y
                  
                    1
                  
                
                =
                1
                ,
                
                  Y
                  
                    2
                  
                
                =
                1
                ∣
                
                  β
                  
                    1
                  
                
                ,
                
                  β
                  
                    2
                  
                
                
                  )
                  
                    
                      Y
                      
                        1
                      
                    
                    
                      Y
                      
                        2
                      
                    
                  
                
                P
                (
                
                  Y
                  
                    1
                  
                
                =
                0
                ,
                
                  Y
                  
                    2
                  
                
                =
                1
                ∣
                
                  β
                  
                    1
                  
                
                ,
                
                  β
                  
                    2
                  
                
                
                  )
                  
                    (
                    1
                    −
                    
                      Y
                      
                        1
                      
                    
                    )
                    
                      Y
                      
                        2

P
                (
                
                  Y
                  
                    1
                  
                
                =
                1
                ,
                
                  Y
                  
                    2
                  
                
                =
                0
                ∣
                
                  β
                  
                    1
                  
                
                ,
                
                  β
                  
                    2
                  
                
                
                  )
                  
                    
                      Y
                      
                        1
                      
                    
                    (
                    1
                    −
                    
                      Y
                      
                        2
                      
                    
                    )
                  
                
                P
                (
                
                  Y
                  
                    1
                  
                
                =
                0
                ,
                
                  Y
                  
                    2
                  
                
                =
                0
                ∣
                
                  β
                  
                    1
                  
                
                ,
                
                  β
                  
                    2
                  
                
                
                  )
                  
                    (
                    1
                    −
                    
                      Y
                      
                        1
                      
                    
                    )
                    (
                    1
                    −
                    
                      Y
                      
                        2
                      
                    
                    )
                  
                
                
                  
                    )
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}L(\beta _{1},\beta _{2})={\Big (}\prod &P(Y_{1}=1,Y_{2}=1\mid \beta _{1},\beta _{2})^{Y_{1}Y_{2}}P(Y_{1}=0,Y_{2}=1\mid \beta _{1},\beta _{2})^{(1-Y_{1})Y_{2}}\\[8pt]&{}\qquad P(Y_{1}=1,Y_{2}=0\mid \beta _{1},\beta _{2})^{Y_{1}(1-Y_{2})}P(Y_{1}=0,Y_{2}=0\mid \beta _{1},\beta _{2})^{(1-Y_{1})(1-Y_{2})}{\Big )}\end{aligned}}}

Substituting the latent variables 
 
 
 
 
 Y
 
 1
 
 
 ∗
 
 
 
 
 {\displaystyle Y_{1}^{*}}
 
 and 
 
 
 
 
 Y
 
 2
 
 
 ∗
 
 
 
 
 {\displaystyle Y_{2}^{*}}
 
 in the probability functions and taking logs gives

∑
              
              
                
                
                  
                    (
                  
                
                
                  Y
                  
                    1
                  
                
                
                  Y
                  
                    2
                  
                
                ln
                ⁡
                P
                (
                
                  ε
                  
                    1
                  
                
                >
                −
                
                  X
                  
                    1
                  
                
                
                  β
                  
                    1
                  
                
                ,
                
                  ε
                  
                    2
                  
                
                >
                −
                
                  X
                  
                    2
                  
                
                
                  β
                  
                    2
                  
                
                )

+
 (
 1
 −
 
 Y
 
 1
 
 
 )
 
 Y
 
 2
 
 
 ln
 ⁡
 P
 (
 
 ε
 
 1
 
 
 <
 −
 
 X
 
 1
 
 
 
 β
 
 1
 
 
 ,
 
 ε
 
 2
 
 
 >
 −
 
 X
 
 2
 
 
 
 β
 
 2
 
 
 )

+
 
 Y
 
 1
 
 
 (
 1
 −
 
 Y
 
 2
 
 
 )
 ln
 ⁡
 P
 (
 
 ε
 
 1
 
 
 >
 −
 
 X
 
 1
 
 
 
 β
 
 1
 
 
 ,
 
 ε
 
 2
 
 
 <
 −
 
 X
 
 2
 
 
 
 β
 
 2
 
 
 )

+
 (
 1
 −
 
 Y
 
 1
 
 
 )
 (
 1
 −
 
 Y
 
 2
 
 
 )
 ln
 ⁡
 P
 (
 
 ε
 
 1
 
 
 <
 −
 
 X
 
 1
 
 
 
 β
 
 1
 
 
 ,
 
 ε
 
 2
 
 
 <
 −
 
 X
 
 2
 
 
 
 β
 
 2
 
 
 )
 
 
 )
 
 
 .
 
 
 
 
 
 
 {\displaystyle {\begin{aligned}\sum &{\Big (}Y_{1}Y_{2}\ln P(\varepsilon _{1}>-X_{1}\beta _{1},\varepsilon _{2}>-X_{2}\beta _{2})\\[4pt]&{}\quad {}+(1-Y_{1})Y_{2}\ln P(\varepsilon _{1}<-X_{1}\beta _{1},\varepsilon _{2}>-X_{2}\beta _{2})\\[4pt]&{}\quad {}+Y_{1}(1-Y_{2})\ln P(\varepsilon _{1}>-X_{1}\beta _{1},\varepsilon _{2}<-X_{2}\beta _{2})\\[4pt]&{}\quad {}+(1-Y_{1})(1-Y_{2})\ln P(\varepsilon _{1}<-X_{1}\beta _{1},\varepsilon _{2}<-X_{2}\beta _{2}){\Big )}.\end{aligned}}}

After some rewriting, the log-likelihood function becomes:

∑
              
              
                
                
                  
                    (
                  
                
                
                  Y
                  
                    1
                  
                
                
                  Y
                  
                    2
                  
                
                ln
                ⁡
                Φ
                (
                
                  X
                  
                    1
                  
                
                
                  β
                  
                    1
                  
                
                ,
                
                  X
                  
                    2
                  
                
                
                  β
                  
                    2
                  
                
                ,
                ρ
                )

+
                (
                1
                −
                
                  Y
                  
                    1
                  
                
                )
                
                  Y
                  
                    2
                  
                
                ln
                ⁡
                Φ
                (
                −
                
                  X
                  
                    1
                  
                
                
                  β
                  
                    1
                  
                
                ,
                
                  X
                  
                    2
                  
                
                
                  β
                  
                    2
                  
                
                ,
                −
                ρ
                )

+
                
                  Y
                  
                    1
                  
                
                (
                1
                −
                
                  Y
                  
                    2
                  
                
                )
                ln
                ⁡
                Φ
                (
                
                  X
                  
                    1
                  
                
                
                  β
                  
                    1
                  
                
                ,
                −
                
                  X
                  
                    2
                  
                
                
                  β
                  
                    2
                  
                
                ,
                −
                ρ
                )

+
                (
                1
                −
                
                  Y
                  
                    1
                  
                
                )
                (
                1
                −
                
                  Y
                  
                    2
                  
                
                )
                ln
                ⁡
                Φ
                (
                −
                
                  X
                  
                    1
                  
                
                
                  β
                  
                    1
                  
                
                ,
                −
                
                  X
                  
                    2
                  
                
                
                  β
                  
                    2
                  
                
                ,
                ρ
                )
                
                  
                    )
                  
                
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\sum &{\Big (}Y_{1}Y_{2}\ln \Phi (X_{1}\beta _{1},X_{2}\beta _{2},\rho )\\[4pt]&{}\quad {}+(1-Y_{1})Y_{2}\ln \Phi (-X_{1}\beta _{1},X_{2}\beta _{2},-\rho )\\[4pt]&{}\quad {}+Y_{1}(1-Y_{2})\ln \Phi (X_{1}\beta _{1},-X_{2}\beta _{2},-\rho )\\[4pt]&{}\quad {}+(1-Y_{1})(1-Y_{2})\ln \Phi (-X_{1}\beta _{1},-X_{2}\beta _{2},\rho ){\Big )}.\end{aligned}}}

Note that 
 
 
 
 Φ
 
 
 {\displaystyle \Phi }
 
 is the <a href="/facts/Cumulative_distribution_function/WaKU8tp4">cumulative distribution function</a> of the <a href="/facts/Bivariate_normal_distribution/2Xfegqz2">bivariate normal distribution</a>. 
 
 
 
 
 Y
 
 1
 
 
 
 
 {\displaystyle Y_{1}}
 
 and 
 
 
 
 
 Y
 
 2
 
 
 
 
 {\displaystyle Y_{2}}
 
 in the log-likelihood function are observed variables being equal to one or zero.

<h2 id="multivariate-probit">Multivariate Probit</h2>
For the general case, 
 
 
 
 
 
 y
 
 i
 
 
 
 =
 (
 
 y
 
 1
 
 
 ,
 .
 .
 .
 ,
 
 y
 
 j
 
 
 )
 ,
  
 (
 i
 =
 1
 ,
 .
 .
 .
 ,
 N
 )
 
 
 {\displaystyle \mathbf {y_{i}} =(y_{1},...,y_{j}),\ (i=1,...,N)}
 
 where we can take 
 
 
 
 j
 
 
 {\displaystyle j}
 
 as choices and 
 
 
 
 i
 
 
 {\displaystyle i}
 
 as individuals or observations, the probability of observing choice 
 
 
 
 
 
 y
 
 i
 
 
 
 
 
 {\displaystyle \mathbf {y_{i}} }
 
 is

Pr
                (
                
                  
                    y
                    
                      i
                    
                  
                
                
                  |
                
                
                  
                    X
                    
                      i
                    
                  
                  β
                
                ,
                Σ
                )
                =
              
              
                
                
                  ∫
                  
                    
                      A
                      
                        J
                      
                    
                  
                
                ⋯
                
                  ∫
                  
                    
                      A
                      
                        1
                      
                    
                  
                
                
                  f
                  
                    N
                  
                
                (
                
                  
                    y
                  
                  
                    i
                  
                  
                    ∗
                  
                
                
                  |
                
                
                  
                    X
                    
                      i
                    
                  
                  β
                
                ,
                Σ
                )
                d
                
                  y
                  
                    1
                  
                  
                    ∗
                  
                
                …
                d
                
                  y
                  
                    J
                  
                  
                    ∗
                  
                
              
            
            
              
                Pr
                (
                
                  
                    y
                    
                      i
                    
                  
                
                
                  |
                
                
                  
                    X
                    
                      i
                    
                  
                  β
                
                ,
                Σ
                )
                =
              
              
                
                ∫
                
                  
                    1
                  
                  
                    
                      y
                      
                        ∗
                      
                    
                    ∈
                    A
                  
                
                
                  f
                  
                    N
                  
                
                (
                
                  
                    y
                  
                  
                    i
                  
                  
                    ∗
                  
                
                
                  |
                
                
                  
                    X
                    
                      i
                    
                  
                  β
                
                ,
                Σ
                )
                d
                
                  
                    y
                  
                  
                    i
                  
                  
                    ∗
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int _{A_{J}}\cdots \int _{A_{1}}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )dy_{1}^{*}\dots dy_{J}^{*}\\\Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )=&\int \mathbb {1} _{y^{*}\in A}f_{N}(\mathbf {y} _{i}^{*}|\mathbf {X_{i}\beta } ,\Sigma )d\mathbf {y} _{i}^{*}\end{aligned}}}

Where 
 
 
 
 A
 =
 
 A
 
 1
 
 
 ×
 ⋯
 ×
 
 A
 
 J
 
 
 
 
 {\displaystyle A=A_{1}\times \cdots \times A_{J}}
 
 and,

A
          
            j
          
        
        =
        
          
            {
            
              
                
                  (
                  −
                  ∞
                  ,
                  0
                  ]
                
                
                  
                    y
                    
                      j
                    
                  
                  =
                  0
                
              
              
                
                  (
                  0
                  ,
                  ∞
                  )
                
                
                  
                    y
                    
                      j
                    
                  
                  =
                  1
                
              
            
            
          
        
      
    
    {\displaystyle A_{j}={\begin{cases}(-\infty ,0]&y_{j}=0\\(0,\infty )&y_{j}=1\end{cases}}}

The log-likelihood function in this case would be

∑
          
            i
            =
            1
          
          
            N
          
        
        log
        ⁡
        Pr
        (
        
          
            y
            
              i
            
          
        
        
          |
        
        
          
            X
            
              i
            
          
          β
        
        ,
        Σ
        )
      
    
    {\displaystyle \sum _{i=1}^{N}\log \Pr(\mathbf {y_{i}} |\mathbf {X_{i}\beta } ,\Sigma )}

Except for 
 
 
 
 J
 ≤
 2
 
 
 {\displaystyle J\leq 2}
 
 typically there is no closed form solution to the integrals in the log-likelihood equation. Instead simulation methods can be used to simulated the choice probabilities. Methods using importance sampling include the <a href="/facts/GHK_algorithm/en1naAWt">GHK algorithm</a>,<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a> AR (accept-reject), Stern's method. There are also MCMC approaches to this problem including CRB (Chib's method with <a href="/facts/Rao%E2%80%93Blackwell_theorem/fI5oGC0K">Rao–Blackwellization</a>), CRT (Chib, Ritter, Tanner), ARK (accept-reject kernel), and ASK (adaptive sampling kernel).<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a> A variational approach scaling to large datasets is proposed in Probit-LMM.<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a>
The Multivariate Probit Model has been applied to simultaneously analyze consumer choice of multiple brands. It has been demonstrated that the Multivariate Probit model extends research possibilities in the demand area by relaxing the restrictive assumption of mutually exclusive alternatives, which characterizes multinomial discrete choice methods.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a>

<h2 id="further-reading">Further reading</h2>
<ul><li>Greene, William H. (2012). "Bivariate and Multivariate Probit Models". Econometric Analysis (Seventh ed.). Prentice-Hall. pp. 778–799. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-13-139538-1.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Ashford, J.R.; Sowden, R.R. (September 1970). "Multivariate Probit Analysis". Biometrics. 26 (3): 535–546. doi:10.2307/2529107. JSTOR 2529107. PMID 5480663. <a href="https://www.jstor.org/stable/2529107" target="_blank">https://www.jstor.org/stable/2529107</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Chib, Siddhartha; Greenberg, Edward (June 1998). "Analysis of multivariate probit models". Biometrika. 85 (2): 347–361. CiteSeerX 10.1.1.198.8541. doi:10.1093/biomet/85.2.347 – via Oxford Academic. <a href="https://academic.oup.com/biomet/article-abstract/85/2/347/298820" target="_blank">https://academic.oup.com/biomet/article-abstract/85/2/347/298820</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Hajivassiliou, Vassilis (1994). "Chapter 40 Classical estimation methods for LDV models using simulation". Handbook of Econometrics. 4: 2383–2441. doi:10.1016/S1573-4412(05)80009-1. ISBN 9780444887665. S2CID 13232902. <a href="9780444887665" target="_blank">9780444887665</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Jeliazkov, Ivan (2010). "MCMC perspectives on simulated likelihood estimation". Advances in Econometrics. 26: 3–39. doi:10.1108/S0731-9053(2010)0000026005. ISBN 978-0-85724-149-8. <a href="978-0-85724-149-8" target="_blank">978-0-85724-149-8</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Mandt, Stephan; Wenzel, Florian; Nakajima, Shinichi; John, Cunningham; Lippert, Christoph; Kloft, Marius (2017). "Sparse probit linear mixed model" (PDF). Machine Learning. 106 (9–10): 1–22. arXiv:1507.04777. doi:10.1007/s10994-017-5652-6. S2CID 11588006. <a href="https://link.springer.com/content/pdf/10.1007%2Fs10994-017-5652-6.pdf" target="_blank">https://link.springer.com/content/pdf/10.1007%2Fs10994-017-5652-6.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Baltas, George (2004-04-01). "A model for multiple brand choice". European Journal of Operational Research. 154 (1): 144–149. doi:10.1016/S0377-2217(02)00654-9. ISSN 0377-2217. <a href="https://linkinghub.elsevier.com/retrieve/pii/S0377221702006549" target="_blank">https://linkinghub.elsevier.com/retrieve/pii/S0377221702006549</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
</ol>

Multivariate probit model open-in-new

Multivariate probit model