Discrete choice models take many forms, including: Binary Logit, Binary Probit, Multinomial Logit, Conditional Logit, Multinomial Probit, Nested Logit, Generalized Extreme Value Models, Mixed Logit, and Exploded Logit. All of these models have the features described below in common.
The choice set is the set of alternatives that are available to the person. For a discrete choice model, the choice set must meet three requirements:
As an example, the choice set for a person deciding which mode of transport to take to work includes driving alone, carpooling, taking bus, etc. The choice set is complicated by the fact that a person can use multiple modes for a given trip, such as driving a car to a train station and then taking train to work. In this case, the choice set can include each possible combination of modes. Alternatively, the choice can be defined as the choice of "primary" mode, with the set consisting of car, bus, rail, and other (e.g. walking, bicycles, etc.). Note that the alternative "other" is included in order to make the choice set exhaustive.
Different people may have different choice sets, depending on their circumstances. For instance, the Scion automobile was not sold in Canada as of 2009, so new car buyers in Canada faced different choice sets from those of American consumers. Such considerations are taken into account in the formulation of discrete choice models.
A discrete choice model specifies the probability that a person chooses a particular alternative, with the probability expressed as a function of observed variables that relate to the alternatives and the person. In its general form, the probability that person n chooses alternative i is expressed as:
where
In the mode of transport example above, the attributes of modes (xni), such as travel time and cost, and the characteristics of consumer (sn), such as annual income, age, and gender, can be used to calculate choice probabilities. The attributes of the alternatives can differ over people; e.g., cost and time for travel to work by car, bus, and rail are different for each person depending on the location of home and work of that person.
Properties:
Different models (i.e., models using a different function G) have different properties. Prominent models are introduced below.
Discrete choice models can be derived from utility theory. This derivation is useful for three reasons:
Uni is the utility (or net benefit or well-being) that person n obtains from choosing alternative i. The behavior of the person is utility-maximizing: person n chooses the alternative that provides the highest utility. The choice of the person is designated by dummy variables, yni, for each alternative:
Consider now the researcher who is examining the choice. The person's choice depends on many factors, some of which the researcher observes and some of which the researcher does not. The utility that the person obtains from choosing an alternative is decomposed into a part that depends on variables that the researcher observes and a part that depends on variables that the researcher does not observe. In a linear form, this decomposition is expressed as
The choice probability is then
Given β, the choice probability is the probability that the random terms, εnj − εni (which are random from the researcher's perspective, since the researcher does not observe them) are below the respective quantities ∀ j ≠ i : β z n i − β z n j . {\displaystyle \forall j\neq i:\beta z_{ni}-\beta z_{nj}.} Different choice models (i.e. different specifications of G) arise from different distributions of εni for all i and different treatments of β.
The probability that a person chooses a particular alternative is determined by comparing the utility of choosing that alternative to the utility of choosing other alternatives:
As the last term indicates, the choice probability depends only on the difference in utilities between alternatives, not on the absolute level of utilities. Equivalently, adding a constant to the utilities of all the alternatives does not change the choice probabilities.
Since utility has no units, it is necessary to normalize the scale of utilities. The scale of utility is often defined by the variance of the error term in discrete choice models. This variance may differ depending on the characteristics of the dataset, such as when or where the data are collected. Normalization of the variance therefore affects the interpretation of parameters estimated across diverse datasets.
Discrete choice models can first be classified according to the number of available alternatives.
Multinomial choice models can further be classified according to the model specification:
In addition, specific forms of the models are available for examining rankings of alternatives (i.e., first choice, second choice, third choice, etc.) and for ratings data.
Details for each model are provided in the following sections.
Further information: binary regression
Further information: Logistic regression
Un is the utility (or net benefit) that person n obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not. The person takes the action, yn = 1, if Un > 0. The unobserved term, εn, is assumed to have a logistic distribution. The specification is written succinctly as:
Further information: Probit model
The description of the model is the same as model A, except the unobserved terms are distributed standard normal instead of logistic.
where Φ {\displaystyle \Phi } is cumulative distribution function of standard normal.
Uni is the utility person n obtains from choosing alternative i. The utility of each alternative depends on the attributes of the alternatives interacted perhaps with the attributes of the person. The unobserved terms are assumed to have an extreme value distribution.20
We can relate this specification to model A above, which is also binary logit. In particular, Pn1 can also be expressed as
Note that if two error terms are iid extreme value,21 their difference is distributed logistic, which is the basis for the equivalence of the two specifications.
The description of the model is the same as model C, except the difference of the two unobserved terms are distributed standard normal instead of logistic.
Then the probability of taking the action is
where Φ is the cumulative distribution function of standard normal.
Further information: Multinomial logit
The utility for all alternatives depends on the same variables, sn, but the coefficients are different for different alternatives:
The choice probability takes the form
where J is the total number of alternatives.
Further information: Conditional logistic regression
The utility for each alternative depends on attributes of that alternative, interacted perhaps with attributes of the person:
Note that model E can be expressed in the same form as model F by appropriate respecification of variables. Define w n j k = s n δ j k {\displaystyle w_{nj}^{k}=s_{n}\delta _{jk}} where δ j k {\displaystyle \delta _{jk}} is the Kronecker delta and sn are from model E. Then, model F is obtained by using
A standard logit model is not always suitable, since it assumes that there is no correlation in unobserved factors over alternatives. This lack of correlation translates into a particular pattern of substitution among alternatives that might not always be realistic in a given situation. This pattern of substitution is often called the Independence of Irrelevant Alternatives (IIA) property of standard logit models.2324 A number of models have been proposed to allow correlation over alternatives and more general substitution patterns:
The following sections describe Nested Logit, GEV, Probit, and Mixed Logit models in detail.
The model is the same as model F except that the unobserved component of utility is correlated over alternatives rather than being independent over alternatives.
Further information: Multinomial probit
The model is the same as model G except that the unobserved terms are distributed jointly normal, which allows any pattern of correlation and heteroscedasticity:
where ϕ ( ε n | Ω ) {\displaystyle \phi (\varepsilon _{n}|\Omega )} is the joint normal density with mean zero and covariance Ω {\displaystyle \Omega } .
The integral for this choice probability does not have a closed form, and so the probability is approximated by quadrature or simulation.
When Ω {\displaystyle \Omega } is the identity matrix (such that there is no correlation or heteroscedasticity), the model is called independent probit.
Main article: Mixed logit
Mixed Logit models have become increasingly popular in recent years for several reasons. First, the model allows β {\displaystyle \beta } to be random in addition to ε {\displaystyle \varepsilon } . The randomness in β {\displaystyle \beta } accommodates random taste variation over people and correlation across alternatives that generates flexible substitution patterns. Second, advances in simulation have made approximation of the model fairly easy. In addition, McFadden and Train have shown that any true choice model can be approximated, to any degree of accuracy by a mixed logit with appropriate specification of explanatory variables and distribution of coefficients.40
The choice probability is
is logit probability evaluated at β , {\displaystyle \beta ,} with J {\displaystyle J} the total number of alternatives.
The integral for this choice probability does not have a closed form, so the probability is approximated by simulation.42
Discrete choice models are often estimated using maximum likelihood estimation. Logit models can be estimated by logistic regression, and probit models can be estimated by probit regression. Nonparametric methods, such as the maximum score estimator, have been proposed.4344 Estimation of such models is usually done via parametric, semi-parametric and non-parametric maximum likelihood methods,45 but can also be done with the Partial least squares path modeling approach.46
In many situations, a person's ranking of alternatives is observed, rather than just their chosen alternative. For example, a person who has bought a new car might be asked what he/she would have bought if that car was not offered, which provides information on the person's second choice in addition to their first choice. Or, in a survey, a respondent might be asked:
The models described above can be adapted to account for rankings beyond the first choice. The most prominent model for rankings data is the exploded logit and its mixed version.
Under the same assumptions as for a standard logit (model F), the probability for a ranking of the alternatives is a product of standard logits. The model is called "exploded logit" because the choice situation that is usually represented as one logit formula for the chosen alternative is expanded ("exploded") to have a separate logit formula for each ranked alternative. The exploded logit model is the product of standard logit models with the choice set decreasing as each alternative is ranked and leaves the set of available choices in the subsequent choice.
Without loss of generality, the alternatives can be relabeled to represent the person's ranking, such that alternative 1 is the first choice, 2 the second choice, etc. The choice probability of ranking J alternatives as 1, 2, ..., J is then
As with standard logit, the exploded logit model assumes no correlation in unobserved factors over alternatives. The exploded logit can be generalized, in the same way as the standard logit is generalized, to accommodate correlations among alternatives and random taste variation. The "mixed exploded logit" model is obtained by probability of the ranking, given above, for Lni in the mixed logit model (model I).
This model is also known in econometrics as the rank ordered logit model and it was introduced in that field by Beggs, Cardell and Hausman in 1981.4748 One application is the Combes et al. paper explaining the ranking of candidates to become professor.49 It is also known as Plackett–Luce model in biomedical literature.505152
Further information: ordinal regression
In surveys, respondents are often asked to give ratings, such as:
Or,
A multinomial discrete-choice model can examine the responses to these questions (model G, model H, model I). However, these models are derived under the concept that the respondent obtains some utility for each possible answer and gives the answer that provides the greatest utility. It might be more natural to think that the respondent has some latent measure or index associated with the question and answers in response to how high this measure is. Ordered logit and ordered probit models are derived under this concept.
Main article: Ordered logit
Let Un represent the strength of survey respondent n's feelings or opinion on the survey subject. Assume that there are cutoffs of the level of the opinion in choosing particular response. For instance, in the example of the helping people facing foreclosure, the person chooses
for some real numbers a, b, c, d.
Defining U n = β z n + ε , ε ∼ {\displaystyle U_{n}=\beta z_{n}+\varepsilon ,\;\varepsilon \sim } Logistic, then the probability of each possible response is:
The parameters of the model are the coefficients β and the cut-off points a − d, one of which must be normalized for identification. When there are only two possible responses, the ordered logit is the same a binary logit (model A), with one cut-off point normalized to zero.
Main article: Ordered probit
The description of the model is the same as model K, except the unobserved terms have normal distribution instead of logistic.
The choice probabilities are ( Φ {\displaystyle \Phi } is the cumulative distribution function of the standard normal distribution):
Train, K. (1986). Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. MIT Press. ISBN 9780262200554. Chapter 8. 9780262200554 ↩
Train, K.; McFadden, D.; Ben-Akiva, M. (1987). "The Demand for Local Telephone Service: A Fully Discrete Model of Residential Call Patterns and Service Choice". RAND Journal of Economics. 18 (1): 109–123. doi:10.2307/2555538. JSTOR 2555538. /wiki/Doi_(identifier) ↩
Train, K.; Winston, C. (2007). "Vehicle Choice Behavior and the Declining Market Share of US Automakers". International Economic Review. 48 (4): 1469–1496. doi:10.1111/j.1468-2354.2007.00471.x. S2CID 13085087. /wiki/International_Economic_Review ↩
Fuller, W. C.; Manski, C.; Wise, D. (1982). "New Evidence on the Economic Determinants of Post-secondary Schooling Choices". Journal of Human Resources. 17 (4): 477–498. doi:10.2307/145612. JSTOR 145612. /wiki/Charles_F._Manski ↩
Train, K. (1978). "A Validation Test of a Disaggregate Mode Choice Model" (PDF). Transportation Research. 12 (3): 167–174. doi:10.1016/0041-1647(78)90120-x. Archived from the original (PDF) on 2010-06-22. Retrieved 2009-02-16. /wiki/Kenneth_E._Train ↩
Baltas, George; Doyle, Peter (2001). "Random utility models in marketing research: a survey". Journal of Business Research. 51 (2): 115–125. doi:10.1016/S0148-2963(99)00058-2. /wiki/Doi_(identifier) ↩
Ben-Akiva, Moshe; Mcfadden, Daniel; Train, Kenneth; Walker, Joan; Bhat, Chandra; Bierlaire, Michel; Bolduc, Denis; Boersch-Supan, Axel; Brownstone, David; Bunch, David S.; Daly, Andrew; De Palma, Andre; Gopinath, Dinesh; Karlstrom, Anders; Munizaga, Marcela A. (2002-08-01). "Hybrid Choice Models: Progress and Challenges". Marketing Letters. 13 (3): 163–175. doi:10.1023/A:1020254301302. ISSN 1573-059X. https://doi.org/10.1023/A:1020254301302 ↩
Ramming, M. S. (2001). Network Knowledge and Route Choice (Thesis). Unpublished Ph.D. Thesis, Massachusetts Institute of Technology. MIT catalogue. hdl:1721.1/49797. /wiki/Hdl_(identifier) ↩
Mesa-Arango, Rodrigo; Hasan, Samiul; Ukkusuri, Satish V.; Murray-Tuite, Pamela (February 2013). "Household-Level Model for Hurricane Evacuation Destination Type Choice Using Hurricane Ivan Data". Natural Hazards Review. 14 (1): 11–20. doi:10.1061/(ASCE)NH.1527-6996.0000083. ISSN 1527-6988. https://ascelibrary.org/doi/10.1061/%28ASCE%29NH.1527-6996.0000083 ↩
Wibbenmeyer, Matthew J.; Hand, Michael S.; Calkin, David E.; Venn, Tyron J.; Thompson, Matthew P. (June 2013). "Risk Preferences in Strategic Wildfire Decision Making: A Choice Experiment with U.S. Wildfire Managers". Risk Analysis. 33 (6): 1021–1037. doi:10.1111/j.1539-6924.2012.01894.x. ISSN 0272-4332. https://onlinelibrary.wiley.com/doi/10.1111/j.1539-6924.2012.01894.x ↩
Lovreglio, Ruggiero; Borri, Dino; dell’Olio, Luigi; Ibeas, Angel (2014-02-01). "A discrete choice model based on random utilities for exit choice in emergency evacuations". Safety Science. 62: 418–426. doi:10.1016/j.ssci.2013.10.004. ISSN 0925-7535. https://www.sciencedirect.com/science/article/pii/S0925753513002294 ↩
Goett, Andrew; Hudson, Kathleen; Train, Kenneth E. (2002). "Customer Choice Among Retail Energy Suppliers". Energy Journal. 21 (4): 1–28. ↩
Revelt, David; Train, Kenneth E. (1998). "Mixed Logit with Repeated Choices: Households' Choices of Appliance Efficiency Level". Review of Economics and Statistics. 80 (4): 647–657. doi:10.1162/003465398557735. JSTOR 2646846. S2CID 10423121. /wiki/Review_of_Economics_and_Statistics ↩
Train, Kenneth E. (1998). "Recreation Demand Models with Taste Variation". Land Economics. 74 (2): 230–239. CiteSeerX 10.1.1.27.4879. doi:10.2307/3147053. JSTOR 3147053. /wiki/CiteSeerX_(identifier) ↩
Cooper, A. B.; Millspaugh, J. J. (1999). "The application of discrete choice models to wildlife resource selection studies". Ecology. 80 (2): 566–575. doi:10.1890/0012-9658(1999)080[0566:TAODCM]2.0.CO;2. /wiki/Doi_(identifier) ↩
The density and cumulative distribution function of the extreme value distribution are given by f ( ε n j ) = exp ( − ε n j ) exp ( − exp ( − ε n j ) ) {\displaystyle f(\varepsilon _{nj})=\exp(-\varepsilon _{nj})\exp(-\exp(-\varepsilon _{nj}))} and F ( ε n j ) = exp ( − exp ( − ε n j ) ) . {\displaystyle F(\varepsilon _{nj})=\exp(-\exp(-\varepsilon _{nj})).} This distribution is also called the Gumbel or type I extreme value distribution, a special type of generalized extreme value distribution. /wiki/Gumbel_distribution ↩
Ben-Akiva, M.; Lerman, S. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand. Transportation Studies. Massachusetts: MIT Press. ↩
Ben-Akiva, M.; Bierlaire, M. (1999). "Discrete Choice Methods and Their Applications to Short Term Travel Decisions" (PDF). In Hall, R. W. (ed.). Handbook of Transportation Science. http://roso.epfl.ch/mbi/handbook-final.pdf ↩
Vovsha, P. (1997). "Application of Cross-Nested Logit Model to Mode Choice in Tel Aviv, Israel, Metropolitan Area". Transportation Research Record. 1607: 6–15. doi:10.3141/1607-02. S2CID 110401901. Archived from the original on 2013-01-29. https://archive.today/20130129010708/http://trb.metapress.com/content/l341607q38j850j7/ ↩
Cascetta, E.; Nuzzolo, A.; Russo, F.; Vitetta, A. (1996). "A Modified Logit Route Choice Model Overcoming Path Overlapping Problems: Specification and Some Calibration Results for Interurban Networks" (PDF). In Lesort, J. B. (ed.). Transportation and Traffic Theory. Proceedings from the Thirteenth International Symposium on Transportation and Traffic Theory. Lyon, France: Pergamon. pp. 697–711. http://www2.informatik.hu-berlin.de/alkox/lehre/lvws0809/verkehr/logit.pdf ↩
Chu, C. (1989). "A Paired Combinatorial Logit Model for Travel Demand Analysis". Proceedings of the 5th World Conference on Transportation Research. Vol. 4. Ventura, CA. pp. 295–309.{{cite book}}: CS1 maint: location missing publisher (link) /wiki/Template:Cite_book ↩
McFadden, D. (1978). "Modeling the Choice of Residential Location" (PDF). In Karlqvist, A.; et al. (eds.). Spatial Interaction Theory and Residential Location. Amsterdam: North Holland. pp. 75–96. /wiki/Daniel_McFadden ↩
Hausman, J.; Wise, D. (1978). "A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogenous Preferences". Econometrica. 48 (2): 403–426. doi:10.2307/1913909. JSTOR 1913909. /wiki/Econometrica ↩
Train, K. (2003). Discrete Choice Methods with Simulation. Massachusetts: Cambridge University Press. ↩
McFadden, D.; Train, K. (2000). "Mixed MNL Models for Discrete Response" (PDF). Journal of Applied Econometrics. 15 (5): 447–470. CiteSeerX 10.1.1.68.2871. doi:10.1002/1099-1255(200009/10)15:5<447::AID-JAE570>3.0.CO;2-1. /wiki/Daniel_McFadden ↩
Ben-Akiva, M.; Bolduc, D. (1996). "Multinomial Probit with a Logit Kernel and a General Parametric Specification of the Covariance Structure" (PDF). Working Paper. http://elsa.berkeley.edu/reprints/misc/multinomial.pdf ↩
Bekhor, S.; Ben-Akiva, M.; Ramming, M. S. (2002). "Adaptation of Logit Kernel to Route Choice Situation". Transportation Research Record. 1805: 78–85. doi:10.3141/1805-10. S2CID 110895210. Archived from the original on 2012-07-17. https://archive.today/20120717185534/http://trb.metapress.com/content/126847136p81w0p3/ ↩
[1]. Also see Mixed logit for further details. http://elsa.berkeley.edu/choice2/ch6.pdf ↩
Manski, Charles F. (1975). "Maximum score estimation of the stochastic utility model of choice". Journal of Econometrics. 3 (3). Elsevier BV: 205–228. doi:10.1016/0304-4076(75)90032-9. ISSN 0304-4076. /wiki/Doi_(identifier) ↩
Horowitz, Joel L. (1992). "A Smoothed Maximum Score Estimator for the Binary Response Model". Econometrica. 60 (3). JSTOR: 505–531. doi:10.2307/2951582. ISSN 0012-9682. JSTOR 2951582. /wiki/Doi_(identifier) ↩
Park, Byeong U.; Simar, Léopold; Zelenyuk, Valentin (2017). "Nonparametric estimation of dynamic discrete choice models for time series data" (PDF). Computational Statistics & Data Analysis. 108: 97–120. doi:10.1016/j.csda.2016.10.024. https://espace.library.uq.edu.au/view/UQ:415620/UQ415620_OA.pdf ↩
Hair, J.F.; Ringle, C.M.; Gudergan, S.P.; Fischer, A.; Nitzl, C.; Menictas, C. (2019). "Partial least squares structural equation modeling-based discrete choice modeling: an illustration in modeling retailer choice" (PDF). Business Research. 12: 115–142. doi:10.1007/s40685-018-0072-4. https://link.springer.com/content/pdf/10.1007/s40685-018-0072-4.pdf ↩
Beggs, S.; Cardell, S.; Hausman, J. (1981). "Assessing the Potential Demand for Electric Cars". Journal of Econometrics. 17 (1): 1–19. doi:10.1016/0304-4076(81)90056-7. /wiki/Journal_of_Econometrics ↩
Combes, Pierre-Philippe; Linnemer, Laurent; Visser, Michael (2008). "Publish or Peer-Rich? The Role of Skills and Networks in Hiring Economics Professors". Labour Economics. 15 (3): 423–441. doi:10.1016/j.labeco.2007.04.003. /wiki/Doi_(identifier) ↩
Plackett, R. L. (1975). "The Analysis of Permutations". Journal of the Royal Statistical Society, Series C. 24 (2): 193–202. doi:10.2307/2346567. JSTOR 2346567. /wiki/Doi_(identifier) ↩
Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley. ↩