Poisson and Negative
Binomial Regression
The
purpose of this session is to show you how to use LIMDEP's
procedures for doing Poisson and Negative Binomial regression. We also show how
to do various tests for overdispersion for
discriminating between the two models.
/*This program estimates Poisson and Negative Binomial
Regression models using the McCullagh and Nelder data on ship accidents. The variables are:
Type
= Ship type
TA,
TB, TC, TD, TE = Ship Type indicators
T6064,
T6569, T7074, T7579 = Year constructed indicators
O6064,
O7579 = Years operated indicators
Months
= Measure of service amount
Acc
= Accidents.
*/
Reset
$
/*Now
let's read in the data */
Read ; File="C:\Documents and Settings\B. Dan
Wood\My Documents\My Teaching\Maximum Likelihood\Data\ShipAccidents.dat"
;
nobs=40 ; nvar = 14 ;
names=type,TA,TB,TC,TD,TE,T6064,T6569,T7074,T7579,O6074,O7579,Months,Acc $
/*
First, let's do some descriptive statistics on the data just to confirm what is
obvious from looking at the data. */
Reject ; Acc=-999 $
Dstats ; Rhs=*
$
Dstats ; Rhs=Acc
; Quantiles ; plot ; all $
/*
Note that the data are non-normal. They are both skewed and kurtotic.
This implies the need for some model other than a normal distribution. Which
model? is the next question. We'll attempt to answer
this question later. */
/*
Let's begin by doing the estimation using the Poisson
model. */
Create ; LOGMTH=log(MONTHS) $
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Margin $
/*
Note that the LIMDEP output for Poisson contains OLS as a first set of
estimates. This can be compared to the Poisson results. Note that in this case
they are very different./*
/*
McCullagh and Nelder assert
that the coefficient on Log(Months) is known to be 1.
As with all of the models we have estimated it is possible to restrict
coefficients. Do this for the Poisson model as
follows. We also save the predicted values and residuals for later testing.*/
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,
1.0
;
Margin
;
List
;
Keep=Lambda
;
Res=err
$
/*
Now let's test the hypothesis that the period of manufacture has no effect on
accidents. First with a Wald
statistic. We'll also save the likelihood from this for later use. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,
1.0
;
Test: b(5)=0,b(6)=0,b(7)=0 $
Calc ; List ; LNLU=LogL $
/*
Now test the same hypothesis with a Likelihood Ratio test. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,1.0 $
Calc ; List ; LNLR=LogL $
Calc ; List ; LNLRatio=2*(LNLU-LNLR)
$
/*
Now test the same hypothesis with a LaGrange Multiplier test */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,1.0 $
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,O7579,Logmth,T6569,T7074,T7579
;
Start=b,0,0,0
;
Rst=b0,b1,b2,b3,b4,b5,1.0,b7,b8,b9
;
Maxit=0 $
/*
McCullagh and Nelder also
assert that the Poisson model exhibits overdispersion.
Now let's test for overdispersion using a regression
based approach suggested by Cameron and Trivedi*/
Create ; Z=((Acc-Lambda)^2-Acc)/(Sqr(2)*Lambda)$
Regress ; Lhs=Z ; Rhs=one
$
Regress ; Lhs=Z ; Rhs=Lambda
$
/*
Since neither t-statistic is significant, there is no
evidence of overdispersion in the data using a
regression based approach. We can also test for overdispersion
using the more general approach suggested on pp. 743-44 of Greene.*/
Create ; ei2 = err*err$
Create ; vi = ei2 - Lambda ; vi2
= vi*vi ; eivi = err*vi $
Namelist; Zi =
TB,TC,TD,TE,T6569,T7074,T7579,O7579,LogMth
; X = One,Zi $
Matrix ; MM = Zi'[vi2]Zi ; DD = X'[ei2]X ; MD =
Zi'[eivi]X
; Q = MM - MD*<DD>*MD'
; r = Zi'vi
; List ; CM =
r'<Q>r $
Calc ; List ; Ctb(.95,8) $
/*CM
is a chi square statistic with k degrees of freedom, where k is the number of regressors. Here we reject the null that the variance is
unrelated to the the regressors
in a way that is not completely accounted for by the expected value of y. In
other words, we have evidence of heterogeneity.
Now
let's test for overdispersion using a LaGrange
Multiplier test as given by Green on p. 744.This is a test for poisson versus negative binomial.*/
Calc ; list ; LMtest=(err'err-n*ybar)^2/(2*lambda'lambda) $
/*
This is a test of the null of the Poisson. It is Chi Squared with 1 degree of
freedom. The test is non-significant, so
no evidence of overdispersion. */
/*
Now let's do the estimation using the Negative Binomial model. Built-in to this
model is yet another test for overdispersion. The
coefficient on the Gamma distribution coefficient provides this test. If it is
significantly different than zero, then this is evidence for overdispersion. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
Model=Negbin
$
/*
Not a very good model. Now let's see if the heterogeneity is due to the period
of manufacture. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11,b12,b13,b14
;
Margin
;
Hfn=TB,TC,TD,TE
;
Model=Negbin $
/*
The Negative Binomial model turns out not to be estimable. It appears that the likelihood surface is too
flat to find a sharply defined maximum. */
/*
Now let's further explore the source of heterogeneity suggested by the tests
above. Assume random heterogeneity. Note that this is an alternative to the
negative binomial. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
Het
$
/*We can also use the negative binomial with random
heterogeneity. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11
;
Margin
;
Model=Negbin
;
Het
$
/*
Looks like the poisson with random heterogenity is a better model. Now let's estimate a model
for underdispersion, the Gamma model. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
List
;
Model=Gamma $
/*
This is not estimable for the same reasons as the
negative binomial. */
/*
Now let's estimate a zero inflated poisson
model. First, with a
logistic splitting distribution. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
List
;
ZIP $
/*
Now with a normal splitting distribution */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
List
;
ZIP=normal $
/*
Now with a splitting distribution functional on a set of variables. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,b11
;
Margin
;
List
;
ZIP
;
Rh2=one,O7579 $
/*
You can also estimate the zero inflated models as
negative binomial or gamma.*/
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
List
;
ZIP
;
Model=Negbin $
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10
;
Margin
;
List
;
ZIP
;
Model=gamma $
/*
Again, these models do not converge for the same
reasons as stated above.
/*
You can also estimate the hurdle models discussed in
lecture. The following assumes a logistic for the binary part. */
Poisson ; Lhs=Acc
;
Rhs=One,TB,TC,TD,TE,T6569,T7074,T7579,O7579,Logmth
;
Rst=b0,b1,b2,b3,b4,b5,b6,b7,b8,1,b10,B11,B12,B13,B14,B15,B16,B17,B18,B19
;
Margin
;
List
;
HURDLE $
/*
That's all for this lesson. */
delete ; * $