Poisson and Negative Binomial Regression

 

The purpose of this session is to show you how to use LIMDEP's procedures for doing Poisson and Negative Binomial regression. We also show how to do various tests for overdispersion for discriminating between the two models.

 

/*This program estimates Poisson and Negative Binomial Regression models using the McCullagh and Nelder data on ship accidents.  The variables are:

 

Type = Ship type

TA, TB, TC, TD, TE = Ship Type indicators

T6064, T6569, T7074, T7579 = Year constructed indicators

O6064, O7579 = Years operated indicators

Months = Measure of service amount

Acc = Accidents.

 

*/

 

Reset $

 

/*Now let's read in the data */

 

Read ; File="C:\Documents and Settings\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\ShipAccidents.dat"

; nobs=40 ; nvar = 14 ; names=type,TA,TB,TC,TD,TE,T6064,T6569,T7074,T7579,O6074,O7579,Months,Acc $

 

/* First, let's do some descriptive statistics on the data just to confirm what is obvious from looking at the data. */

 

Reject ; Acc=-999 $

 

Dstats ; Rhs=* $

Dstats ; Rhs=Acc ; Quantiles ; plot ; all $

 

/* Note that the data are non-normal. They are both skewed and kurtotic. This implies the need for some model other than a normal distribution. Which model? is the next question. We'll attempt to answer this question later. */

 

/* Let's begin by doing the estimation using the Poisson model. */

 

Create ; LOGMTH=log(MONTHS) $

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Margin $

 

/* Note that the LIMDEP output for Poisson contains OLS as a first set of estimates. This can be compared to the Poisson results. Note that in this case they are very different./*

 

/* McCullagh and Nelder assert that the coefficient on Log(Months) is known to be 1. As with all of the models we have estimated it is possible to restrict coefficients. Do this for the Poisson model as follows. We also save the predicted values and residuals for later testing.*/

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8, 1.0

; Margin

; List

; Keep=Lambda

; Res=err

$

 

/* Now let's test the hypothesis that the period of manufacture has no effect on accidents. First with a Wald statistic. We'll also save the likelihood from this for later use. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8, 1.0

; Test: b(5)=0,b(6)=0,b(7)=0 $

 

Calc ; List ; LNLU=LogL $

 

/* Now test the same hypothesis with a Likelihood Ratio test. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,1.0 $

 

Calc ; List ; LNLR=LogL $

Calc ; List ; LNLRatio=2*(LNLU-LNLR) $

 

/* Now test the same hypothesis with a LaGrange Multiplier test */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,1.0 $

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,O7579,Logmth,T6569,T7074,T7579

; Start=b,0,0,0

; Rst=b0,b1,b2,b3,b4,b5,1.0,b7,b8,b9

; Maxit=0 $

 

/* McCullagh and Nelder also assert that the Poisson model exhibits overdispersion. Now let's test for overdispersion using a regression based approach suggested by Cameron and Trivedi*/

 

Create ; Z=((Acc-Lambda)^2-Acc)/(Sqr(2)*Lambda)$

Regress ; Lhs=Z ; Rhs=one $

Regress ; Lhs=Z ; Rhs=Lambda $

 

/* Since neither t-statistic is significant, there is no evidence of overdispersion in the data using a regression based approach. We can also test for overdispersion using the more general approach suggested on pp. 743-44 of Greene.*/

 

Create  ; ei2 = err*err$

Create  ; vi = ei2 - Lambda ; vi2 = vi*vi ; eivi = err*vi $

Namelist; Zi  = TB,TC,TD,TE,T6569,T7074,T7579,O7579,LogMth

        ; X  = One,Zi $

Matrix  ; MM = Zi'[vi2]Zi ; DD = X'[ei2]X ; MD = Zi'[eivi]X

        ; Q  = MM - MD*<DD>*MD'

        ; r  = Zi'vi

        ; List ; CM = r'<Q>r $

Calc    ; List ; Ctb(.95,8) $

 

/*CM is a chi square statistic with k degrees of freedom, where k is the number of regressors. Here we reject the null that the variance is unrelated to the the regressors in a way that is not completely accounted for by the expected value of y. In other words, we have evidence of heterogeneity.

 

Now let's test for overdispersion using a LaGrange Multiplier test as given by Green on p. 744.This is a test for poisson versus negative binomial.*/

 

Calc ; list ; LMtest=(err'err-n*ybar)^2/(2*lambda'lambda) $

 

/* This is a test of the null of the Poisson.  It is Chi Squared with 1 degree of freedom.  The test is non-significant, so no evidence of overdispersion. */

 

/* Now let's do the estimation using the Negative Binomial model. Built-in to this model is yet another test for overdispersion. The coefficient on the Gamma distribution coefficient provides this test. If it is significantly different than zero, then this is evidence for overdispersion. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; Model=Negbin

$

 

/* Not a very good model. Now let's see if the heterogeneity is due to the period of manufacture. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11,b12,b13,b14

; Margin

; Hfn=TB,TC,TD,TE

; Model=Negbin $

 

/* The Negative Binomial model turns out not to be estimable.  It appears that the likelihood surface is too flat to find a sharply defined maximum. */

 

/* Now let's further explore the source of heterogeneity suggested by the tests above. Assume random heterogeneity. Note that this is an alternative to the negative binomial. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; Het

$

 

/*We can also use the negative binomial with random heterogeneity. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11

; Margin

; Model=Negbin

; Het

$

 

 

/* Looks like the poisson with random heterogenity is a better model. Now let's estimate a model for underdispersion, the Gamma model.  */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; List

; Model=Gamma $

 

/* This is not estimable for the same reasons as the negative binomial. */

 

/* Now let's estimate a zero inflated poisson model.  First, with a logistic splitting distribution. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; List

; ZIP $

 

/* Now with a normal splitting distribution */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; List

; ZIP=normal $

 

/* Now with a splitting distribution functional on a set of variables. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11

; Margin

; List

; ZIP

; Rh2=one,O7579 $

 

/* You can also estimate the zero inflated models as negative binomial or gamma.*/

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; List

; ZIP

; Model=Negbin $

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10

; Margin

; List

; ZIP

; Model=gamma $

 

/* Again, these models do not converge for the same reasons as stated above.

 

/* You can also estimate the hurdle models discussed in lecture. The following assumes a logistic for the binary part. */

 

Poisson ; Lhs=Acc

; Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth

; Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,B11,B12,B13,B14,B15,B16,B17,B18,B19

; Margin

; List

; HURDLE $

 

/* That's all for this lesson. */

 

delete ; * $