Multinomial Models for Discrete Outcomes

 

/* This file compares procedures for multinomial logit and ordered probit for data that are naturally ordered. A

common mistake is to estimate naturally ordered data with MNL or MNP. First, we estimate a multinomial logit (MNL) for the Spector and Mazzeo data with a new dependent variable, LETTERS. LETTERS is coded 0=C, 1=B, 2=A.

Then we reestimate the model as an ordered probit to compare the results.

 

Note of caution: Strictly speaking, choice models assume that n individuals make the choices, and that the choices are independent. However, in this example it is the instructor making the choices, and the choices are obviously not independent. In order to consider this example as a "choice" model, we need to think of the instructor as making n independent decisions based on the students' past grades, achievement tests, and whether s/he received a treatment, PSI. This may not be substantively correct, since the instructor should merely evaluate performance based on course materials, rather than paying attention to the student's attributes. However, the objective is to illustrate how to estimate these models; not specify an appropriate substantive model.*/

 

 

/*Using this file requires installation of dmlogit2 and the Hausman and Small-Hsiao IIA modules.  It also uses SPOST which you should have already installed.  Run a net search from within STATA on the following commands and install the proper files:

dmlogit2

iia

smhsiao

*/

 

/*Let's load the data*/

set more off

use "c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\letters.dta", clear

summarize

 

/*Multinomial Logit */

 

mlogit letters gpa tuce psi, base(0)

 

 /*The command "base(0)" sets 0 as the comparison group */

 

/* Note that MNL produces J-1 sets of coefficient estimates for J choices. Typically, these are then used to compute probabilities and marginal effects. CAUTION: the coefficients from MNL models can be misleading. Their signs can actually be in the opposite direction from the marginal effects, because the coefficients from all J-1 equations enter into the calculations of both the marginal effects and probabilities.*/

 

/*Stata does not automatically report the predicted probabilities.  However, we can retrieve them in the following manner*/

 

predict p0 p1 p2, p                 /*Calculate p(y=1) for each y */

predict z0, xb outcome(0)                 /*Calculate z for each y */

predict z1, xb outcome(1)                 /*Calculate z for each y */

predict z2, xb outcome(2)                 /*Calculate z for each y */

list letters z0 p0 z1 p1 z2 p2            /*List y, predicted probabilities, and z */

 

/*We can also obtain other helpful statistics*/

fitstat                                         /*Obtain various fit statistics on the logit */

listcoef, help                                  /*List the coefficients and standardized coefficients*/

prvalue, x(psi=0) rest(mean)                    /*Compute p(grade=1) when psi=0, and rest at mean */

prvalue, x(psi=1) rest(mean)                    /*Compute p(grade=1) when psi=1, and rest at mean */

prchange, help                                  /*Compute various first differences */

dmlogit2 letters gpa tuce psi, base(0)          /*The model in derivatives with all variables at means*/

 

/* Reestimate the model for an alternative procedure to calculate derivatives */

 

mlogit letters gpa tuce psi, base(0)

mfx compute, predict(outcome(1))          /*The model in derivatives switching psi from zero to one*/

mfx compute, predict(outcome(2))

 

/*Test for Independence of Irrelevant Alternatives in MNL model with SPOST and mlogtest.   By the way, this procedure does a variety of tests, including whether categories can be combined or not.  You can ask for the tests individually, but the option “all” gives you everything.  Use "search mlogtest" to get a listing of the tests.*/

 

mlogit letters gpa tuce psi, base(0)

mlogtest, all

 

/*Alternatively, there are other user defined procedures for doing the Hausman and Small-Hsiao tests.  */

 

mlogit letters gpa tuce psi, base(0)

iia

 

smhsiao letters gpa tuce psi, elim(0)

 

/*Now let's drop the predicted probabilities*/

 

drop z0 p0 z1 p1 z2 p2

 

/* STATA also offers a version of multinomial probit. However, it assumes no correlation between errors and seems to give short shrift to the required restrictions. The following iterates for a long time and should probably be stopped after sufficient iterations.*/

 

mprobit letters gpa tuce psi

 

/* MNL is inefficient when the dependent variable is naturally ordered. Grade assignments is a classic example of ranking data. So, let's reestimate the relationship with ordered probit. */

 

oprobit letters gpa tuce psi

 

/* Note that with ordered probit there is only one set of coefficient estimates. This is more appealing, but CAUTION is still required in interpreting the coefficients. As with MNL above, ordered probit coefficients can also have opposite signs from the marginal effects.  This is because increasing X, while holding the coefficient and threshold estimates constant actually shifts the distribution to the right. This may decrease the probability associated with particular outcomes. See Greene, pp. 736-40 for an example. */

 

/*We can also obtain predicted probabilities similar to the MNL above*/

 

predict p0 p1 p2, p                             /*Calculate p(y=1) for each y */

predict z, xb                                   /*Calculate z for each y */

list letters z p0 p1 p2                         /*List y, predicted probabilities, and z */

fitstat                                         /*Obtain various fit statistics on the logit */

listcoef, help                                  /*List the coefficients and standardized coefficients*/

prvalue, x(psi=0) rest(mean)                    /*Compute p(grade=1) when psi=0, and rest at mean */

prvalue, x(psi=1) rest(mean)                    /*Compute p(grade=1) when psi=1, and rest at mean */

prchange, help                                  /*Compute various first differences */

 

 

mfx compute, predict(outcome(1))    /*The model in derivatives switching psi from zero to one*/

mfx compute, predict(outcome(2))

 

/*Again, let's drop the predicted probabilities*/

 

drop z p0 p1 p2

 

/* It is also possible to obtain ordered logit estimates. These are rarely seen in the literature. However, the following illustrates the ordered logit procedure. */

 

ologit letters gpa tuce psi

predict p0 p1 p2, p                             /*Calculate p(y=1) for each y */

predict z, xb                                   /*Calculate z for each y */

list letters z p0 p1 p2                         /*List y, predicted probabilities, and z */

fitstat                                         /*Obtain various fit statistics on the logit */

listcoef, help                                  /*List the coefficients and standardized coefficients*/

prvalue, x(psi=0) rest(mean)                    /*Compute p(grade=1) when psi=0, and rest at mean */

prvalue, x(psi=1) rest(mean)                    /*Compute p(grade=1) when psi=1, and rest at mean */

prchange, help                                  /*Compute various first differences */

mfx compute, predict(outcome(1))                /*The model in derivatives switching psi from zero to one*/

mfx compute, predict(outcome(2))

 

/*Again, let's drop the predicted probabilities*/

 

drop z p0 p1 p2

 

/*Now let's take a look at conditional logit.  To do this, we will need to use a new data set*/

 

use "c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\clogit.dta", clear

set more off

 

/* This file illustrates an application of STATA's DISCRETE CHOICE procedure, conditional logit. The model is one of choice of mode for transportation, for a sample of individuals who travel between Sydney and Melbourne, Australia. The four choices are Air, Train, Bus, and Car.

 

Note the special set-up for the data for DISCRETE CHOICE models. The data consist of observations on choices, rather than on individuals. For each individual there are j possible choices, so that there are j*n observations in the data set. Basically, the data setup is similar to that for panel data. See the STATA help system or manual for more on setting up the data for DISCRETE CHOICE models.

 

Conditional Logit is commonly used when the decisionmaker chooses primarily on the basis of attributes of the choices, rather than attributes of the individual. This approach is illustrated in the program below.*/

 

/* Below we estimate a conditional logit model for choice of mode of transport using choice specific constants and a generalized measure of perceived cost of the method of transportation, household income, and terminal waiting time. Note that unlike LIMDEP we must specify the group ourselves, in this case "group(id)"*/

 

clogit mode aasc tasc basc gc ttme hinca, group(id)

predict prob, p                           /*Calculate p(y=1) for each y */

predict zi, xb                            /*Calculate z for each y */

list mode prob zi                               /*List y, predicted probabilities, and z */

fitstat                                   /*Obtain various fit statistics on the logit */

listcoef, help                            /*List the coefficients and standardized coefficients*/

 

 

/* With conditional logit as opposed to MNL, we get a single set of coefficient estimates. This is an appealing attribute of the model. The preceding estimates say simply that people choose their transportation mode based on perceived cost and a set of unmeasured attributes associated with the choices. Note that perceived cost is an attribute of the choice, rather than an attribute of the individual.

 

However, it might be reasonable to assert that the decisionmaker's income is also relevant to the choice. The decisionmaker's income does not vary across the alternatives. As a result, the discrete choice probabilities are homogeneous of degree zero in the parameters. That bit of jargon means that if there are any attributes that are the same for all outcomes for every individual, they drop out of the probability model.

 

Thus, for example, any individual characteristic such as age, or income, will cause this problem. The only way such variables can be brought into the conditional logit model is by interacting them with the choice specific constants. This is similar to analysis of covariance in regression.*/

 

/*Test for IIA using conditional logit setup */

 

clogit mode aasc tasc basc gc ttme hinca, group(id)

est store all

clogit mode aasc tasc basc gc ttme hinca if choice ~=1, group(id)

est store partial

hausman partial all, alleqs constant

 

/*Now let’s do a Nested Logit which avoids the problem of IIA, as well as allows modeling a “tree-like” probability model where choices are conditioned by prior choices.*/

 

/*Read in the data again to rid ourselves of the preceding definitions. */

 

use "c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\clogit.dta", clear

set more off

sort id

by id: gen travel=_n-1

nlogitgen type=travel(fly: 0, land: 1|2|3)

nlogittree travel type

nlogit mode (travel=aasc tasc basc gc ttme) (type=hinca),group(id)