Multinomial Models for Discrete Outcomes

 

The purpose of this session is to show you how to use LIMDEP's procedures for doing Multinomial Logit (MNL) and Probit (MNP). Additionally, we look at Conditional Logit, testing for IIA, Nested Logit, the Heteroskedastic Multinomial Logit, Random Parameter Logit, and Ordered Logit and Probit.

 

First, MNL and Ordered Probit

 

/* This file compares procedures for multinomial logit and ordered probit for data that are naturally ordered. A common mistake is to estimate naturally ordered data with MNL or MNP. First, we estimate a multinomial logit (MNL) for the Spector and Mazzeo data with a new dependent variable, LETTERS. LETTERS is coded 0=C, 1=B, 2=A. Then we reestimate the model as an ordered probit

to compare the results.

 

Note of caution: Strictly speaking, choice models assume that n individuals make the choices, and that the choices are independent. However, in this example it is the instructor making the choices, and the choices are obviously not independent. In order to consider this example as a "choice" model, we need to think of the instructor as making n independent decisions based on

the students' past grades, achievement tests, and whether s/he received a treatment, PSI. This may not be substantively correct, since the instructor should merely evaluate performance based on course materials, rather than paying attention to the student's attributes. However, the objective is to illustrate how to estimate these models; not specify an appropriate substantive model.*/

 

Reset $

 

/* Read the data. */

 

Read ; Nobs=32 ; Nvar=6 ; Names=OBS,GPA,TUCE,PSI,GRADE,LETTERS $

 

1 2.66 20 0 0 0

2 2.89 22 0 0 1

3 3.28 24 0 0 1

4 2.92 12 0 0 1

5 4 21 0 1 2

6 2.86 17 0 0 1

7 2.76 17 0 0 1

8 2.87 21 0 0 1

9 3.03 25 0 0 2

10 3.92 29 0 1 2

11 2.63 20 0 0 0

12 3.32 23 0 0 1

13 3.57 23 0 0 1

14 3.26 25 0 1 2

15 3.53 26 0 0 1

16 2.74 19 0 0 1

17 2.75 25 0 0 0

18 2.83 19 0 0 0

19 3.12 23 1 0 1

20 3.16 25 1 1 2

21 2.06 22 1 0 2

22 3.62 28 1 1 2

23 2.89 14 1 0 2

24 3.51 26 1 0 1

25 3.54 24 1 1 2

26 2.83 27 1 1 2

27 3.39 17 1 1 2

28 2.67 24 1 0 1

29 3.65 21 1 1 2

30 4 23 1 1 2

31 3.1 21 1 0 2

32 2.39 19 1 1 2

 

/* Create a namelist matrix that contains the right side variables */

 

NAMELIST ; X = one,gpa,tuce,psi $

DSTATS ; RHS=* $

 

/* Estimate a multinomial logit with 3 categories. LIMDEP automatically recognizes that you want MNL by the coding of the dependent variable. The coding must always be 0,1,2,...,J, beginning with 0. Note that we are also calling for marginal effects based on the stratification variable, psi. */

 

LOGIT ; LHS = letters ; Rhs = X ; Margin=psi ; List $

 

/* Note that MNL produces J-1 sets of coefficient estimates for J choices. Typically, these are then used to compute probabilities and marginal effects. CAUTION: the coefficients from MNL models can be misleading. Their signs can actually be in the opposite direction from the marginal effects, because the coefficients from all J-1 equations enter into the calculations of both the marginal effects and probabilities.*/

 

/* Using the LIST option, LIMDEP only gives the probability for one category.  However, it is easy to calculate probabilities for all three categories.  Below is a short program that illustrates how to calculate the probabilities from the 3 category MNL model estimated above. You would need to modify this when there are more categories. See LIMDEP's help system for an example with 4 categories.*/

 

CALC ; list ; K = Col(X) ; K1 = K+1

            ; TwoK = 2*K ; K21 = TwoK+1 $

MATRIX ; list ; B1 = Part(B,1,K) ; B2 = Part(B,K1,TwoK) $

CREATE ; p0=1/(1+exp(B1'X)+exp(B2'X))

; p1=exp(B1'X)*P0

       ; p2=exp(B2'X)*P0 $

List  ; p0,p1,p2 $

 

/* MNL is inefficient when the dependent variable is naturally ordered. Grade assignments is a classic example of ranking data. So, let's reestimate the relationship with ordered probit.*/

 

ORDERED ; LHS=letters ; Rhs=X ; Margin=psi ; List $

 

/* Note that with ordered probit there is only one set of coefficient estimates. This is more appealing, but CAUTION is still required in interpreting the coefficients. As with MNL above, ordered probit coefficients can also have opposite signs from the marginal effects.

 

This is because increasing X, while holding the coefficient and threshold estimates constant actually shifts the distribution to the right. This may decrease the probability associated with particular outcomes. See Greene, 736-40 for an example. */

 

/* It is also possible to obtain ordered logit estimates. These are rarely seen in the literature. However, the following illustrates the ordered logit procedure. */

 

ORDERED ; Logit ; LHS=letters ; Rhs=X ; Margin=psi ; List $

 

/* Compare the ordered probability models to MNL above. Note especially that the ordered models correctly classified 22 of 32 outcomes, while MNL only got 17 of the 32. MNL is especially bad, given that we could have correctly classified 15 simply by looking at the modal category, 2. Also, note the large number of errors for MNL in classification of B's. */

 

delete ; * $

 

 

 

Conditional Logit is commonly used when the decisionmaker chooses primarily on the basis of attributes of the choices, rather than attributes of the individual. This approach is illustrated in the program below.

 

/* This file illustrates applications of LIMDEP's DISCRETE CHOICE procedures, including multinomial logit, conditional logit, nested logit heteroskedastic extreme value, and multinomial probit. The model is one of choice of mode for transportation, for a sample of individuals who travel between Sydney and Melbourne, Australia. The four choices are Air, Train, Bus, and Car.

 

Note the special set-up for the data for DISCRETE CHOICE models. The data consist of observations on choices, rather than on individuals. For each individual there are j possible choices, so that there are j*n observations in the data set. Basically, the data setup is similar to that for panel data. See the LIMDEP help system or manual for more on setting up the data for

DISCRETE CHOICE estimation.

 

The data set contains the following:

 

Mode = 0/1 for four alternatives: 1=Air, 2=Train, 3=Bus, 4=Car

Ttme = terminal waiting time

Invc = Invehicle cost for all stages

Invt = Invehicle time for all stages

Gc = Generalized cost measure = Invc + Invt þ value of time

Chair = Dummy variable for chosen mode is air

Hinc = Household income in thousands

Psize = Travelling party size.

Transformed variables include

Indj = Indicator to select mode given not air

Indi = Indicator to select mode Air/Not air

Aasc = Choice specific dummy for Air

Tasc = Choice specific dummy for Train

Basc = Choice specific dummy for Bus

Casc = Choice specific dummy for Car

Psizea = Psize þ Aasc

Z = Tasc + Basc + Casc = Dummy variable for Not Air

Nij=1 if Aasc=1 and 3 if Aasc=0, = number of choices in branch

Ni = 2 = number of branches in tree.

 

*/

 

reset $

Read ; File="C:\Documents and Settings\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\clogit.dat"

; nobs=840 ; nvar = 19 ; names=Mode, Ttme,Invc, Invt, Gc, ChAir, Hinc, Psize, Indj, Indi, Aasc, Tasc, Basc, Casc, Hinca, Psizea, Nalt, Nij, Ni $

 

Sample ; 1 - 840 $

create hinca=Hinc*Indi $

 

/* Now, let's estimate a conditional logit model for choice of mode of transport using choice specific constants and a generalized measure of perceived cost of the method of transportation, terminal waiting time, and household income interacted with air versus ground transportation. See p. 730 of Greene 2003.for discussion.  Note that we replicate here the results in Table 21.11. LIMDEP creates a set of choice specific dummy variables automatically when we include the ONE option on the RHS= statement. */

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=one,gc,Ttme,Hinca

; show

; describe

; crosstab

; effects:gc(*)

; list

$

 

/* We can also specify the conditional logit model through a set of utility functions for each of the J choices. For example, consider the following that produces the same result as the preceding. */

 

Nlogit ; lhs=mode

; choices=Air,Train,Bus,Car

; model:

U(Air) = BA + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Train) = BT + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Bus) = BB + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Car) = Bg*gc+Btt*Ttme+Bh*Hinca $

 

/* LIMDEP also allows weighting to take account of choice based sampling.  For example, the following replicates the results in Greene 2002 the right side of Table 21.11 */

 

Nlogit ; lhs=mode

; choices=Air,Train,Bus,Car/.14,.13,.09,.64

; model:

U(Air) = BA + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Train) = BT + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Bus) = BB + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Car) = Bg*gc+Btt*Ttme+Bh*Hinca $

 

/* The advantage of the utility function approach is that you can specify different utility functions for each choice. */

 

/* Under the discrete choice framework, we get a single set of coefficient estimates. This is an appealing attribute of the framework. The preceding estimates say simply that people choose their transportation mode based on perceived cost, travel time, and a set of unmeasured attributes associated with the choices. Note that perceived cost and travel time are attributes of the choice, rather than an attribute of the individual. In contrast, income is an attribute of the individual that weights the decision toward air, rather than ground transportation methods.  The decisionmaker's income does not vary across the alternatives sowe must interact it with the relevant mode choice.

 

We need to be especially cautious about interpreting the coefficients.  We can look at probabilities, partial derivatives, and elasticities.*/

 

 

/* We can also test for violations of the IIA assumption more readily under the DISCRETE CHOICE framework. In practice, the test often fails because the difference matrix is not positive definite. Note: When you omit a choice using the conditional logit procedure in LIMDEP, it's constant term gets left behind, and it becomes a column of zeros. Therefore, you cannot include a constant when performing the test. However, if you create the choice specific dummy variables yourself, you can include those in the model, and drop the one for the choice omitted for the test. */

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=gc,Ttme

$

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=gc,Ttme

; ias= car $

 

/* The test may also break down because LIMDEP was unable to pick off the correct elements of the coefficient and covariance matrices because of the different models.  For example, Greene's model reported on p. 731 omits a variable from the second specification that makes it impossible for a canned procedure to do the test.  However, we can perform the test manually as shown by the example below that replicates the result in Greene, on p. 731 */

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=one,gc,Ttme,Hinca

$

MATRIX ; list; Bu = B ; Vu = VARB$

MATRIX ; list ; Bu=[b(1)/b(2)/b(5)/b(6)] $

MATRIX ; list;Vu=[varb(1,1)/

                  varb(2,1),varb(2,2)/

                  varb(5,1),varb(5,2),varb(5,5)/

                  varb(6,1),varb(6,2),varb(6,5),varb(6,6)]$

REJECT ; aasc=1 $

REJECT ; chair=1 $

Nlogit ; lhs=mode

; choices=train,bus,car

; rhs=one,gc,Ttme

$

MATRIX ; list ; Br=b ; Vr=varb $

MATRIX ; list ; d=br-bu ; vd=vr-vu $

MATRIX ; list ; Hausman=d'<vd>d $

 

delete ; * $

 

 

 

 

/* Next, we will illustrate LIMDEP's Nested Logit procedure, and the Heteroskedastic Extreme Value model, both of which avoid the problem of IIA. For the Nested Logit procedure, we will fit a model with 2 levels (LIMDEP allows up to 4 levels). Here is the model.

 

 

 

The data is the same as above for the conditional logit procedure. The example replicates the results in Table 21.14 of Greene 2003.

 

See Greene pp. 731-732 for discussion.

 

The data for this model are,

 

Mode = binary indicator of choice,

Gc = the generalized travel cost measure

Ttme = travel time

Hinca = household income times FLY dummy variable

 

An explanation of the LIMDEP commands to fit this model is given below each ?.

 

? Model is discrete choice

 

Discrete Choice

 

? Dependent Variable, 4 choices, choice based sampling weights

; lhs=mode ; choices=Air,Train,Bus,Car

? Tree structure

; tree = travel[fly(Air),land(Train,Bus,Car)]

? Utility functions.

U(Air) = BA + BZ*gc +Btt*Ttme/

U(Train) = BT + BZ*gc  +Btt*Ttme/

U(Bus) = BZ*gc  +Btt*Ttme/

U(Car) = BC + BZ*gc  +Btt*Ttme/

? Choice between Air or Land mode.

U(fly,land)= A1*hinca

? Obtain starting values by fitting the model ignoring the nested logit structure

; start=logit

? Display descriptive statistics for variables in utility functions

; describe

? Compute elasticities of probabilities for the choices with

? respect to the cost variable.

; effects:gc(air,train,bus,car) $ */

 

 

Now, let's estimate the model.

 

Reset $

 

/* Read in the data again. */

 

Read ; file="c:\Documents and Settings\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\clogit.dat" ; nobs=840 ; nvar = 19

; names = Mode, Ttme, Invc, Invt, Gc, ChAir, Hinc, Psize, Indj, Indi, Aasc, Tasc, Basc, Casc, Hinca, Psizea, Nalt, Nij, Ni$

 

Sample ; 1 - 840 $

 

Nlogit ; lhs=mode

; describe

; start=logit

; effects:gc(air,train,bus,car)

; choices=Air,Train,Bus,Car

; tree = travel[fly(Air),land(Train,Bus,Car)]

; model:

U(Air) = BA + Bz*gc + Btt*Ttme/

U(Train) = BT + BZ*gc + Btt*Ttme/

U(Bus) = BB+ BZ*gc + Btt*Ttme/

U(Car) = BZ*gc + Btt*Ttme/

U(fly,land) = A1*Hinca

; tlf=.0001 ; maxit=150 $

 

 

/* Now let's estimate the Heteroskedastic Extreme Value model. This replicates the results in Greene, Table 21.15.*/

 

Nlogit ; lhs=mode

; choices=Air,Train,Bus,Car

; model:

U(Air) = BA + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Train) = BT + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Bus) = BB + Bg*gc+Btt*Ttme+Bh*Hinca/

U(Car) = Bg*gc+Btt*Ttme+Bh*Hinca

; het

$

 

/* Now let's estimate the Random Parameter Logit model. This replicates the results in Greene, Table 21.17, last column, You might want to put off estimating this model, because it is computationally intensive due to the need to simulate probabilities (i.e., it requires some time).  */

 

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=gc,Ttme,hinca

; rh2=one

; describe

; list

; effects: gc(air)

; rpl

; fcn= gc(N),ttme(N),hinca(N)

$

 

 

 

/* Now, let's illustrate LIMDEP's procedure for doing Multinomial Probit. Make sure that you run this program last, since it is very slow.*/

 

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=gc,Ttme,Hinca,one

; describe

; list

; Pts=20

; effects: gc(air)

; mnp

$

 

/* You could also impose restrictions on the covariance structure.  The preceding program calls for no restrictions. However, the following would impose the restriction of homoskedasticity and no correlation between alternatives. The corr= statement gives a set of restrictions on the utility function covariances required for identification. The sdv statement sets all the diagonal elements of the covariance matrix to one. */

 

Nlogit ; lhs=mode

; choices=air,train,bus,car

; rhs=one,gc,Ttme,Hinca

; describe

; list

; effects: gc(air)

; Pts=20

; sdv=1

; corr=0

; mnp

$

 

/* Another example of imposing restrictions.,

 

The lower diagonal of the r matrix is:

 

R(train,air)

R(bus,air) R(bus,train)

R(car,air) R(car,train) R(car,bus)

 

Then, the statement below would impose an equality restriction on car-air, car-train, and car-bus, and a fixed value constraint on bus-train.

 

; corr=Rta,Rba,0.5,Rc,Rc,Rc

 

You can also restrict the model to some restricted variance arrangement

 

other than homoskedasticity as above. Give the statement:

 

; sdv=sigair,sigtrain,sigbus,sigcar

 

where sigair...sigcar are numbers fo the variance of each utility function. Note that sigcar will always be 1 in this case so if there are

equality constraints this should be taken into account. /*

 

 

delete ; * $