Poisson and Negative
Binomial Regression
The
purpose of this session is to show you how to use STATA's procedures for count
models including Poisson, Negative Binomial zero inflated Poisson, and zero inflated
Negative Binomial Regression. We also show how to do various tests for
overdispersion and for discriminating between models.
/*This
program estimates Poisson and Negative Binomial Regression models using the
McCullagh and Nelder data on ship accidents.
The variables are:
Type
= Ship type
TA,
TB, TC, TD, TE = Ship Type indicators
T6064,
T6569, T7074, T7579 = Year constructed indicators
O6064,
O7579 = Years operated indicators
Months
= Measure of service amount
Acc
= Accidents. */
/*Now
let's read in the data */
use
"C:\Documents and Settings\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\shipaccidents.dta", clear
/*
First, let's do some descriptive statistics on the data just to confirm what is
obvious from looking at the data. */
summarize
summarize
acc, detail
drop
if acc==.
/*
Note that the data are non-normal. They are both skewed and kurtotic. This
implies the need for some model other than a normal distribution. Which model?
is the next question. We'll attempt to answer this question later. */
/*
Let's begin by doing the estimation using the Poisson model. */
gen
logmth=log(months)
poisson
acc tb tc td te t6569 t7074 t7579 o7579 logmth
fitstat /*Obtain
various fit statistics on the poisson regression*/
listcoef,
help /*List
the coefficients and standardized coefficients*/
prvalue,
x(o7579=0) rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange,
help /*Compute
various first differences */
mfx
compute, at(mean) /*Compute
marginal effects at the means */
poisgof /*Goodness
of fit for poisson model */
/*
McCullagh and Nelder assert that the coefficient on Log(Months) is known to be
1. As with all of the models we have estimated it is possible to restrict
coefficients. Do this for the Poisson model as follows. We also save the
predicted values and residuals for later testing.*/
constraint
define 1 logmth=1
poisson
acc tb tc td te t6569 t7074 t7579 o7579 logmth, constraints (1)
mfx
compute, at(mean)
/*
Generate lambda and errors for later use */
predict
lambda, ir
gen
err=acc-lambda
/*
Now let's test the hypothesis that the period of manufacture has no effect on
accidents. First with a Wald statistic. We'll also save the likelihood from
this for later use. */
poisson
acc tb tc td te t6569 t7074 t7579 o7579 logmth,constraints(1)
lrtest,
saving(0)
/*A
Wald test that the period of manufacture has no effect on accidents.*/
test
t6569 t7074 t7579
/*
Now test the same hypothesis with a Likelihood Ratio test. */
poisson
acc tb tc td te o7579 logmth,constraints(1)
lrtest
/*
McCullagh and Nelder also assert that the Poisson model exhibits
overdispersion. Let's test for overdispersion using a regression based approach
as suggested by Cameron and Trivedi*/
gen
z=((acc-lambda)^2-acc)/(sqrt(2)*lambda)
regress
z
regress
z lambda, noconstant
/*
Since neither t-statistic is significant, there is no evidence of
overdispersion in the data using a regression based approach.
We
can also test for overdispersion using the more general approach suggested on
pp. 743-44 of Greene.*/
gen
i=1
gen
ei2 = err*err
mkmat
ei2
gen
vi = ei2 - lambda
mkmat
vi
gen
vi2 = vi*vi
mkmat
vi2
gen
eivi = err*vi
mkmat
eivi
mkmat
tb tc td te t6569 t7074 t7579 o7579 logmth,matrix(zi)
mkmat
i tb tc td te t6569 t7074 t7579 o7579 logmth,matrix(x)
matrix
mm = zi'*diag(vi2)*zi
matrix
dd = x'*diag(ei2)*x
matrix
md=zi'*diag(eivi)*x
matrix
q = mm - md*inv(dd)*md'
matrix
r = zi'*vi
matrix
cm = r'*inv(q)*r
matrix
list cm
/*
CM is a chi square statistic with k degrees of freedom, where k is the number
of regressors. Here we reject the null that the variance is unrelated to the
the regressors in a way that is not completely accounted for by the expected
value of y. In other words, we have evidence of heterogeneity due to the
regressors. We also reject the null that the mean equals the variance.
Now
let's test for overdispersion using a LaGrange Multiplier test as given by
Green on p. 744.This is a test for poisson versus negative binomial.*/
mkmat
i
egen
ybar=mean(acc)
mkmat
ybar
matrix
nybar=i'*ybar
mkmat
err, matrix(err)
mkmat
lambda,matrix(lambda)
matrix
lmtest=((err'*err-nybar)*(err'*err-nybar))*inv(2*lambda'*lambda)
matrix
list lmtest
/*
The LaGrange Multiplier statistic is Chi Squared with 1 degree of freedom. The test is non-significant, so no evidence
of overdispersion. */
/*
We can also test for overdispersion in the context of the Negative Binomial
model. If the heterogeneity parameter is significant then this is evidence for
overdispersion.*/
nbreg
acc tb tc td te t6569 t7074 t7579 o7579 logmth,constraints(1)
/*The
parameter alpha is not significant, so no evidence for overdispersion. We would
not want to use the negative binomial for these data.*/
/*
A way to deal with heterogeneity and the overdispersion problem is to estimate
a so-called zero inflated model. However, this should only be used when there
is reason to suspect multiple regimes in the data. We can estimate a zero
inflated Poisson model, obtain the Vuong test, and obtain measures of fit, and
effects. The default binary distribution is the logit. Specify probit option to
get probit. */
zip
acc tb tc td te t6569 t7074 t7579 o7579 logmth, offset(logmth) inflate(o7579)
vuong
fitstat /*Obtain
various fit statistics on the zip regression*/
listcoef,
help /*List
the coefficients and standardized coefficients*/
prvalue,
x(o7579=0) rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange,
help /*Compute
various first differences */
mfx
compute, at(mean) /*Compute
marginal effects at the means */
/*
We can also estimate a zero inflated Negative Binomial model, obtain the Vuong
test, and obtain measures of fit, and effects. The default binary distribution
is the logit. Specify probit to get probit. The zip option for the zinb
procedure gives a test for zip vs. zinb. */
zinb
acc tb tc td te t6569 t7074 t7579 o7579 logmth, offset(logmth) inflate(o7579)
vuong zip
fitstat /*Obtain
various fit statistics on the zip regression*/
listcoef,
help /*List
the coefficients and standardized coefficients*/
prvalue,
x(o7579=0) rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange,
help /*Compute
various first differences */
mfx
compute, at(mean) /*Compute
marginal effects at the means */
/*
Note that there is a problem with the zero inflated Negative Binomial model
probabilities, confirming what we found above concerning the inappropriateness
of the negative binomial. */