Dichotomous Logit and Probit

 

The purpose of this session is to show you how to use LIMDEP's "canned" procedures for doing dichotomous Logit and Probit analysis. This includes obtaining predicted probabilities, predictions of the dependent variable, coefficients and marginal effects for the variables, model diagnostics, hypothesis tests, and the heteroskedastic Probit model. In addition, we provide programs that obtain each of these outputs the "hard" way for illustrative purposes only.

 /* This program illustrates some of the "canned" procedures in LIMDEP for doing Logit and Probit. The data is a sample of 32 observations from a study by Spector and Mazzeo on pre- and post-tests for economic literacy. The variables are:

GPA = grade point average
TUCE = test score on teaching college level economics
PSI = program participation variable (school district)
GRADE = response: 1 if grade on post is higher than on pre,
0 otherwise. */

/* First read the data. Notice that this time we are putting the data directly into the file. */

Reset $

Read; Nobs=32; Nvar=4; Names=GPA,TUCE,PSI,GRADE $

2.66 20 0 0
2.89 22 0 0
3.28 24 0 0
2.92 12 0 0
4.00 21 0 1
2.86 17 0 0
2.76 17 0 0
2.87 21 0 0
3.03 25 0 0
3.92 29 0 1
2.63 20 0 0
3.32 23 0 0
3.57 23 0 0
3.26 25 0 1
3.53 26 0 0
2.74 19 0 0
2.75 25 0 0
2.83 19 0 0
3.12 23 1 0
3.16 25 1 1
2.06 22 1 0
3.62 28 1 1
2.89 14 1 0
3.51 26 1 0
3.54 24 1 1
2.83 27 1 1
3.39 17 1 1
2.67 24 1 0
3.65 21 1 1
4.00 23 1 1
3.10 21 1 0
2.39 19 1 1

Sample ; 1 - 32 $

/* Binary choice models: Since the interest is in the PSI variable - this is the treatment Effect of the program- we will test the hypothesis that the coefficient is equal to 0.0. These commands will illustrate several ways to test hypotheses using LIMDEP. */

Probit ; Lhs = Grade ; Rhs = One,GPA,TUCE,PSI $

/* The t-ratio on the coefficient reported for PSI provides a direct test. We'll repeat the test a couple of different ways. First, we can extract the t-statistic directly from the model results if we wish. */

Calc ; list ; T for PSI = b(4) / Sqr(Varb(4,4)) $

/* We can also ask LIMDEP to carry out a Wald test. For a single coefficient the chi-squared is going to be the same as the square of the t-ratio. This command uses the Last Model setup, which allows specification of Wald tests using a predefined set of parameter names and uses the results of the previous model for the coefficients and covariance matrix */

Wald ; Fn1 = b_psi - 0 $

/* LIMDEP will also carry out a test of a set of linear restrictions as part of the estimation procedure */

Probit ; Lhs = Grade ; Rhs = One,GPA,TUCE,PSI ; wald: b(4) = 0 $

/* For more involved problems, a likelihood ratio test or an LM test might preferable. To carry out the likelihood ratio test, we require

chi - squared = 2 * (LOGL_with PSI - LOGL_without PSI)

We have the first one from the last estimate. To get the second, we just leave PSI out of the model and retrieve logl. If the chi-squared value exceeds the critical value 3.84, we reject the hypothesis. Alternatively if the probability to the left of our statistic is less that, say, .05, we reject the hypothesis. */

Calc ; YESPSI = LogL $
Probit; Lhs = Grade ; Rhs = One,GPA,TUCE $
Calc ; NOPSI = LogL $
Calc ; List ; Chisq = 2*(YesPSI - NoPSI)
; Cprob = 1 - Chi(chisq,1) $

/* Lastly, to compute an LM statistic, we need the chi-squared statistic

CLM = g0' V g0

where g0 is the gradient of the log-likelihood function for the probit model evaluated at the maximum likelihood estimates with the hypothesis assumed. This is the last model estimated above, which computes the MLE with the coefficient on PSI fixed at 0. (by dropping PSI). V is the inverse Hessian computed at this estimate. LIMDEP automates this; you just provide the restricted estimates and specify zero iterations. */

Probit; Lhs = Grade ; Rhs = One,GPA,TUCE,PSI
; Start = B,0 ; Maxit = 0 $
Calc ; List ; LMSTAT $

/* Now let's look at the available options on Logit/Probit procedures

; List - gives fitted values
; Keep= -creates a variable containing the predictions
; Res= -creates a variable containing the residuals
; Alg= -specifies the estimation algorithm.
;
Tlf= ; Tlb= ; Tlg= - overrides the convergence criterion
; Maxit= -gives the maximum number of iterations. If 0, then a LaGrange Multiplier statistic is given.
;
Var= -list of variances to extract from the variance matrix of b.
; Start= -gives a vector of starting values. This is useful for complex optimizations.
; Margin computes marginal effects (partial derivatives) for each coefficient at the means.
; Margin = variable - computes marginal effects (partial derivatives) with all coefficients at their means except a stratification variable.
; Het - estimates a heteroskedastic version of probit when ....
; Rh2 - a vector of variables producing the heteroskedasticity is given.
; Rst= imposes restrictions on the listed coefficients to values given in the vector of starting values. */

/* Below we estimate the homoskedastic model for both logit and probit and all output, including marginal effects for each variable. */

Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI
; List ; Margin=PSI $
Logit ; Lhs=Grade ; Rhs=One,GPA,TUCE,PSI
; List ; Margin=PSI $

/* Now let's do the heteroskedastic probit with heteroskedasticity a function of logged GPA and compute a Wald statistic. */

Create ; LOGGPA=log(GPA) $
Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI
; Rh2=LOGGPA ; Het ; Wald: b(5)=0 $

/* The t statistic on LOGGPA is the square root of the Wald statistic for no heteroskedasticity. In more complex specifications we could use the Wald option for joint tests of multiple coefficients in the variance vector as was done above. We could also compute a LaGrange Multiplier test for no heteroskedasticity as is done below. */

Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI $
Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI ;
; Rh2=LOGGPA ; Het ; Start=B,0 ; Maxit=0 $
Calc ; List ; LMSTAT $

/* Finally, we could do a likelihood ratio test for no heteroskedasticity*/

Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI $
Calc ; LNL_Homo=LogL $
Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI
; Rh2=LOGGPA ; Het $
Calc ; LNL_Het=LogL $
Calc ; List ; LRHet=2*(LNL_Het - LNL_Homo) $

/* Suppose for some reason you want to do restricted estimation. Let's impose the restriction that GPA=3. */

Probit ; Lhs=grade ; Rhs=One,GPA,TUCE,PSI
; Start=-7,3,.05,1.43 ; Rst=b1,(),b3,b4 $

/* Here the restriction is contained in the coefficient start values and the coefficient to be restricted is marked by ().


/* Delete all variables, and get ready for another Program */

Delete ; * $

 

The preceding makes use of LIMDEP's built-in functions. However, the curious among you may not be satisfied with this, given your earlier experience with the Maximize command. Here is a program that illustrates how to do Logit/Probit using your own log-likelihood function. I also show how to input analytical derivatives to the Maximize procedure.

 

/* This file shows how to estimate Logit and Probit models using the Maximize procedure. There is no need, practically, to do this, because LIMDEP has a very nice set of "canned" procedures for Logit/Probit. However, it is still a useful exercise for learning. Also, the last procedure shows how to supply analytical derivatives to the Maximize procedure. */

Reset $

/* First, let's create some simulated data to use in this demonstration. X is random normal numbers with mean zero and standard deviation one. From x we create y and q for the log-likelihood function. */

Sample ; 1-200 $
Create ; x=rnn(0,1) ; y=x+rnn(0,1) ; y=y>0 ; q=2*y-1 $

/* Now, let's compute some descriptive statistics on the simulated data. Note the normality plots in particular in relation to the dependent variable */

Dstats ; Rhs=x,y ; Quantiles ; Plot $

/* Now let's use the maximize procedure to maximize the log-likelihood with respect to the parameters. See Greene, p. 882,note 6 for the parameterization below of the log-likelihood functions. */

/* Let's do Logit First */

Maximize ; Labels=b0,b1
; Start=0,1
; Fcn=log(lgp(q*(b0+b1*x)))
$

/* Now do Probit */

Maximize ; Labels=b0,b1
; Start=0,1
; Fcn=log(phi(q*(b0+b1*x)))
$

/* Now let's use the maximize procedure using some subfunctions and supplying analytical derivatives to the procedure. This could be useful if you have a large data set in terms of speed. Note that the | indicates a subfunction. The |_ indicates analytical first partial derivatives. We'll use probit for this example. */

Maximize ; Labels=b0,b1
; Start=0,1
; Fcn=
bx=q*(b0+b1*x)
|d=q*n01(bx)/phi(bx)
|_b0=d
|_b1=d*x
|log(phi(bx)) $

/* Delete the variables and prepare for another run */

Delete ; * $

 

Some of you may also be curious about how LIMDEP's "canned" procedure obtains probabilities, predictions, and marginal effects. You may also want to see more details on the heteroskedastic probit model. Here is a program that does both.

 /* This program illustrates some of the matrix and calculation procedures of LIMDEP in the context of comparing dichotomous Logit and Probit. In particular, we show how to obtain the marginal effects, probabilities, and predicted values the hard way. Also, the heteroskedastic probit model is estimated using the Maximize procedure for illustrative purposes. */

/* Let's use the Spector and Mazzeo data again for this example. */

Reset $

Read; Nobs=32 ; Nvar=4 ; Names=GPA,TUCE,PSI,GRADE $

2.66 20 0 0
2.89 22 0 0
3.28 24 0 0
2.92 12 0 0
4.00 21 0 1
2.86 17 0 0
2.76 17 0 0
2.87 21 0 0
3.03 25 0 0
3.92 29 0 1
2.63 20 0 0
3.32 23 0 0
3.57 23 0 0
3.26 25 0 1
3.53 26 0 0
2.74 19 0 0
2.75 25 0 0
2.83 19 0 0
3.12 23 1 0
3.16 25 1 1
2.06 22 1 0
3.62 28 1 1
2.89 14 1 0
3.51 26 1 0
3.54 24 1 1
2.83 27 1 1
3.39 17 1 1
2.67 24 1 0
3.65 21 1 1
4.00 23 1 1
3.10 21 1 0
2.39 19 1 1

Sample ; 1 - 32 $

/* Are the probit and logit models really different? The coefficients are, but are the slopes and the predicted probabilities? Let's find out below. */

? 1. List of regressors

Namelist ; X = One,GPA,TUCE,PSI $

? 2. Vector of means

Matrix ; Xbar = Mean(X) $

? 3. Probit model. Keep predictions of Grade (0s and 1s)

Probit ; Lhs = Grade ; Rhs = X ; keep = GFP ; list $

? 4. Derivatives are a scale times the coefficients

Calc ; Scale = N01(b'Xbar) $

? 5. Predicted probit probabilities

Create ; PP = Phi(b'x) $

? 6. Probit coefficients and derivatives of conditional mean

Matrix ; BP = B ; DP = scale * B $

? Now, repeat for the logit model

Logit ; Lhs = Grade ; Rhs = X ; keep = GFL $

? 1. Derivatives for logit are also a scale times coefficients

Calc ; Scale = lgd(b'Xbar) $

? 2. Predicted logit probabilities

Create ; PL = lgp(b'x) $

? Logit coefficients and derivatives of the conditional mean

Matrix ; BL = B ; DL = scale * B $

? Compare the two models. First, coefficients, side by side

Matrix ; List ; [BP,BL] ; Pause $

? Now, derivatives, side by side

Matrix ; List ; [DP,DL] ; Pause $

? Predictions and fitted probabilities

List ; PP,PL,GFP,GFL $

 

/* Now let's do the heteroskedastic probit using the Maximize procedure. In the example in the "canned" procedure, we used GPA as the possible source of heteroskedasticity. This time let's see if there are group effects causing heteroskedasticity by using PSI. We'll obtain starting values by estimating the "canned" homoskedastic procedure first. Note that we could also have gotten this from the "hard" procedure above. */

Create ; q=2*Grade-1 $
Probit ; Lhs=Grade ; Rhs=One,GPA,Tuce,PSI $
Maximize ; Labels=b0,b1,b2,b3,gamma
; Start=B,1
; Fcn=
Xb=b0+b1*GPA+b2*Tuce+b3*PSI
|Sig=exp(gamma*PSI)
|Xbh=Xb/Sig
|log(phi(q*(Xbh)))
$

/* The t statistic on gamma in the preceding is the square root of a Wald test for the null of no heteroskedasticity. We could also do LaGrange Multiplier and Likelihood Ratio tests. I leave it to you to replicate the analysis on p. 891 of Greene. (By the way, note the math error in Greene's Wald test on p. 891.)

Delete ; * $