MODELS WITH NON-NORMAL
DISTURBANCES
The
purpose of this session is to show you how to estimate and test for models with
non-normal disturbances. In regression
applications where the dependent variable is strictly positive, the assumption
of normally distributed errors may be inappropriate. Other applications may
involve truncation or censoring at the high end, or on both ends. The MAXIMIZE
command and various "canned" procedures in LIMDEP provide the ability
to estimate linear regression models with a range of distributional
assumptions.
Of
course, the starting point in moving to an alternative estimator is theory. We
should always define the nature of the statistical experiment that produced the
sample. Ideally, there should be a close match between the data attributes and
the attributes of the probability distribution chosen for estimation. For
example, if the data are always positive but unbounded on the upper end, then
one of the distributions derived from the Generalized Gamma Function might be
appropriate. If the data are a logged form of a Cobb-Douglas type function,
then we might suspect a LogNormal
distribution of disturbances. The Beta distribution offers the widest range of
possible forms, but this might be a limitation if we want to justify the use of
a distribution based on substantive theory. Ideally, we would like to be able
to derive the appropriate distribution based on substantive theory. However,
this may not always be possible. In this exercise, we show how to implement an
empirical strategy for choosing an appropriate non-normal distribution.
/* This file
demonstrates how to test and estimate regression models that have non-normal
disturbances. We consider a range of models including the Generalized Gamma,
Gamma, Exponential, Weibull, Beta, and LogNormal distributions. */
Reset $
/* Below, let's create
some simulated data that we know has a normal distribution. Then compute descriptives as above on the simulated data. Compare the skewness, kurtosis, and quantile
plots to the results above */
Sample ; 1-1000 $
Create ; normdata = rnn(0,1) $
Dstats ; rhs=normdata ; quantiles ; plot ; all $
/* Note that the skew and kurtosis statistics,
along with the quantile plots provide evidence
concerning the distribution of the data, relative to the normal distribution.
The skewness statistic measures the symmetry of the
distribution, while the kurtosis statistic measures the relative peakedness of the distibution
compared to a normal distribution. A perfectly symmetrical distribution would
have skewness=0; a non-kurtotic
distribution would have kurtosis=3 (Note: some versions of kurtosis are
centered on 0, rather than 3, by subtracting out the 3. LIMDEP reports the
version centered on 3.
The quantile plots compare
quantiles of the actual data to quantiles
of the same range drawn from a normal distribution. This provides yet another
indicator of the normality of the data. Normal data should follow the line on
the quantile plot. A sigmoid shape indicates kurtosis,
while the slope of the plot relative to the line implies a particular direction
of skew. Take care, however, of relying too much on these plots and statistics.
They pertain to the sample, not the population from which the sample is drawn.
Theory is the best guide, particularly with small N. */
Reset $
/* The next line will
read in some real data. Change the path to find data */
Read ; format=xls;
file="c:\users\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\prestige.xls"
; Names $
List ; educatio,income,women,prestige,census
$
Sample; 1-102 $
/* Now let's print the data and compute
descriptive statistics */
Dstat ; Rhs=income,educatio,prestige
; Quantiles ; Plot ; Output=3 ; All $
/* Now, let's estimate the relationship between
consume, income, and price using the standard regression model, save the
residuals, and test for normality */
Regress ; Lhs=income ; Rhs=One,educatio,prestige ; Res=e $
Dstats ; Rhs=e ; quantiles
; plot ; all $
/* The residuals from
this regression appear skewed and kurtotic so we will
assume some of the other distributions. */
/* First, let's assume a Generalized Gamma
Distribution. This distribution encompasses a Gamma, Exponential, and Weibull distribution with some simple parameter
restrictions, as shown below. It also
encompasses a lognormal distribution with some mathematical manipulation. In
this version of the Generalized Gamma there are two distributional parameters
that we are labeling c and k. Note that
this is a slightly different parameterization relative to that given in the
lecture. Relative to the lecture
discussion of a Generalized Gamma with 4 parameters, we are restricting the
location parameter to zero. Note also
that following estimation we save the log likelihood for later hypothesis
testing. */
/* Because of the complexity of the remaining
function, I have used LIMDEP's ability to use subfunctions. The | symbol AFTER a subfunction
identifies each subfunction. The last function in the
list gives the function to be optimized. */
Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1
; alg=bfgs ; tlf=.0001
; output=3
; Fcn= bx = a0+a1*educatio+a2*prestige |
cbx = income/bx |
cpk = c+(1/k) |
log(k) - lgm(c) + c*k*(lgm(cpk) - lgm(c) )
- log(income) +c*k*log(cbx)
-( exp(lgm(cpk)-lgm(c))*(cbx)
)^k$
CALC; lnLGG=Logl $
/* Now let's assume a Gamma Distribution. This
distribution is true when k=1 in the preceding Generalized Gamma Disribution. We use the fix option to set the parameter to
1.*/
Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1
; alg=bfgs ; tlf=.0001
; output=3 ; fix=k
; Fcn= bx = a0+a1*educatio+a2*prestige |
cbx = income/bx |
cpk = c+(1/k) |
log(k) - lgm(c) + c*k*(lgm(cpk) - lgm(c) )
- log(income) +c*k*log(cbx)
-( exp(lgm(cpk)-lgm(c))*(cbx)
)^k$
Calc; lnLG=Logl $
/* Now let's assume a Weibull
Distribution. This distribution is true when c=1 in the preceding Generalized
Gamma Distribution. Again, we use the fix option to set the parameter to this
value.*/
Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1
; alg=bfgs ; tlf=.0001
; output=3 ; fix=c
; Fcn= bx = a0+a1*educatio+a2*prestige |
cbx = income/bx |
cpk = c+(1/k) |
log(k) - lgm(c) + c*k*(lgm(cpk) - lgm(c) )
- log(income) +c*k*log(cbx)
-( exp(lgm(cpk)-lgm(c))*(cbx)
)^k$
Calc ; lnLW=Logl $
/* Now let's assume an Exponential Distribution.
This distribution is true when c=k=1 in the preceding Generalized Gamma
Distribution. */
Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1
; alg=bfgs ; tlf=.0001
; output=3 ; fixe=c,k
; Fcn= bx = a0+a1*educatio+a2*prestige |
cbx = income/bx |
cpk = c+(1/k) |
log(k) - lgm(c) + c*k*(lgm(cpk) - lgm(c) )
- log(income) +c*k*log(cbx)
-( exp(lgm(cpk)-lgm(c))*(cbx)
)^k$
Calc ; lnLE=Logl $
/* Model Discrimination: Choosing the correct
distribution is more of an art than a science. Of course, as we noted above, it
would be best if there were a basis in substantive theory for the chosen model.
Lacking this alternative, we can construct hypothesis tests based on Likelihood
Ratio, Wald, or LaGrange Multiplier principles.
For example, using a Likelihood Ratio test we
could test each of the submodels against the Generalized
Gamma model. The likelihood ratio statistic is chi squared with degrees of
freedom equal to the number of parameter restrictions (1 for Generalized Gamma
to Gamma and Weibull; 2 for Generalized Gamma to
Exponential) */
Calc ; List ; LRGGG=2*(lnLGG-lnLG)
$ ? Generalized Gamma to Gamma
Calc ; List ; LRGGW=2*(lnLGG-lnLW)
$ ? Generalized Gamma to Weibull
Calc ; List ; LRGGE=2*(lnLGG-lnLE)
$ ? Generalized Gamma to Exponential
/* We could also test
the Exponential Model against the Gamma and Weibull
models. These likelihood ratio tests are chi squared with one degree of
freedom. */
Calc ; List ; LRWE=2*(lnLW-lnLE)
$ ? Weibull to Exponential
Calc ; List ; LRGE=2*(lnLG-lnLE)
$ ? Gamma to Exponential
/* On this basis we choose the Gamma model above,
since it is not statistically different than the Generalized Gamma model and is
more parsimonious (1 less parameter). However, the Gamma model is statistically
different than the Exponential which has one less parameter. This suggests that
the parameter restriction for the Exponential model is inappropriate.
Note that we would reach similar conclusions by
simply looking at the t statistics associated with the probability distribution
parameters of the four estimated models. */
/* Another distribution
that is related to the Generalized Gamma is the LogNormal.
However, it is not obtained by a simple change in parameterization as above. If
a variable,y, is lognormal,
then its variance is proportional to the square of its mean. The LogNormal is given. */
Maximize ; labels=a0,a1,a2,Sig
; Start=-714,-169,199,1
; Fcn=
-log(sqr(2*pi)*Sig)-log(income)-(1/(2*Sig^2))
*(log(income/(a0+a1*educatio+a2*prestige))+(Sig^2)/2)^2
; alg=bfgs ; tlf=.0001
; output=3$
/* The LogNormal
Distribution is a "canned" procedure in LIMDEP. Call the LogNormal MLE as follows. */
Lognormal ; Lhs=income ; Rhs=One,educatio,prestige $
/*
When the dependent data are in proportions (i.e.,0<y<1 ) then a Beta Distribution is often used. If
the data are not in proportions then it is possible to put them into
proportions simply by dividing the dependent variable
by its maximum. Of course, if you transform the data just to use the
Beta, then the scaling and interpretation of the coefficients will be
different, so it is probably better to just use the Beta with variables that
are naturally proportions unless you have a good reason to want the Beta. The
Beta Distribution is perhaps the most flexible distribution. Look again at the
Excel spreadsheet for the Beta to get a sense of just how flexible it is. Of
course, this flexibility may be its major drawback since it is often difficult
to justify on the basis of substantive theory. (The above distributions may be
also.) Here is the Beta distribution applied to cross-national data on infant
mortality per 1000 as a function of national income and whether the nation is
oil exporting.
*/
reset $
Read ; format=xls ; File =
"c:\users\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\Leinhardt.xls"
; Names $
List ; infant,income,oilyes $
Sample; 1-101 $
/* First, create the rate of infant mortality
from the data. This can be thought of as the probability of one infant death
per 1000 draws. This probability naturally ranges from 0 to 1.*/
create ; infantmr = infant /
1000 $
create ; infantod = exp(infantmr)/(1+exp(infantmr)) $
/* First run a linear regression to obtain
starting values for reasons to be made clear below. */
Regress ; lhs=infantod ; rhs=one,income,oilyes $
Maximize ; labels=a0,a1,a2,p
; Start=B,3
; Fcn=
Xb = a0+a1*income+a2*oilyes |
lgm(p)
-lgm(Xb*p)
-lgm((1-xB)*P)
+(Xb*p-1)*log(infantod)
+log(1-infantod)*((1-Xb)*p-1)
; alg=bfgs
; tlf=.00001
$
/* Note that obtaining a good set of starting
values is crucial to convergence or even initial iteration. B above in the
vector of starting values is the vector of coefficients from the prior
regression. In general this is a good way to obtain starting values for MLE
procedures. See also the article by Ferrari and Cribari-Neto
(2004) */
/*Now that I've put you through the pain of
programming your own log likelihood functions for these continuous non-normal distributions
in LIMDEP, I want to point out that many of them are available as canned
procedures in LIMDEP using the LOGLINEAR models. The parameterization is different under these
models. Specifically, the relationship
between the two models is xb=1/ln(xb).
Here are some examples of the canned procedures.
*/
reset $
Read ; format=xls;
file="c:\users\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\prestige.xls"
; Names $
List ; educatio,income,women,prestige,census
$
Sample; 1-102 $
loglinear ; lhs=income ; rhs=one,educatio,prestige
; model=gamma $
loglinear ; lhs=income ; rhs=one,educatio,prestige
; model=weibull $
loglinear ; lhs=income ; rhs=one,educatio,prestige
; model=exponential $
/* There is also a procedure for the beta
distribution. However, it seems to require a long time to converge with these
data, if ever. So you might not want to run this one. */
reset $
/* There is also a procedure for the beta
distribution. However, it is different parameterization than the one above. It
is a two parameter distribution which assumes each parameter is a function of
the conditional mean. Also, this
procedure seems to take a long time to converge with these
data, if ever. So you might not want to run this one. */
reset $
Read ; format=xls ; File =
"c:\users\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\Leinhardt.xls"
; Names $
Sample; 1-101 $
create ; infantmr = infant /
1000 $
create ; infantod = exp(infantmr)/(1+exp(infantmr)) $
loglinear ; lhs=infantod ; rhs=one,income,oilyes ; model = beta $
Delete ; * $
Stop $