MODELS WITH NON-NORMAL
DISTURBANCES
The
purpose of this session is to show you how to estimate and test for models with
non-normal disturbances. In regression
applications where the dependent variable is strictly positive, the assumption
of normally distributed errors may be inappropriate. Other applications may
involve truncation or censoring at the high end, or on both ends. STATA enables
you to test normality assumptions, as well as to estimate models with
non-normal disturbances.
Of
course, the starting point in moving to an alternative estimator is theory. We
should always define the nature of the statistical experiment that produced the
sample. Ideally, there should be a close match between the data attributes and
the attributes of the probability distribution chosen for estimation. For
example, if the data are always positive but unbounded on the upper end, then
one of the distributions derived from the Generalized Gamma Function might be
appropriate. If the data are a logged form of a Cobb-Douglas type function,
then we might suspect a LogNormal distribution of
disturbances. The Beta distribution offers the widest range of possible forms,
but this might be a limitation if we want to justify the use of a distribution
based on substantive theory. Ideally, we would like to be able to derive the
appropriate distribution based on substantive theory. However, this may not
always be possible. In this exercise, we show how to implement an empirical
strategy for choosing an appropriate non-normal distribution.
/* This file
demonstrates how to test and estimate regression models that have non-normal
disturbances. We consider a range of models including the Generalized Gamma,
Gamma, Exponential, Weibull, Beta, and LogNormal distributions. */
clear
set more off
/* Below, let's
create some simulated data that we know has a normal distribution. Then compute
descriptives as above on the simulated data. Compare
the skewness, kurtosis, and quantile
plots.*/
drawnorm
x, n(5000)
sum x, detail
sktest
x
qnorm
x
/* Note that the skew
and kurtosis statistics, along with the quantile
plots provide evidence concerning the distribution of the data, relative to the
normal distribution. The skewness statistic measures
the symmetry of the distribution, while the kurtosis statistic measures the
relative peakedness of the distibution
compared to a normal distribution. A perfectly symmetrical distribution would
have skewness=0; a non-kurtotic
distribution would have kurtosis=3. The quantile plot compares quantiles
of the actual data to quantiles of the same range
drawn from a normal distribution. This provides yet another indicator of
the normality of the
data. Normal data should follow the line on the quantile
plot. A sigmoid shape indicates kurtosis, while the slope of the plot relative
to the line implies a particular direction of skew. Take care, however, of
relying too much on these plots and statistics. They pertain to the sample, not
the population from which the sample is drawn. Theory is the best guide,
particularly with small N. Another helpful test in STATA is the “sktest” command.
In this example, we
are unable to reject the null hypothesis that the variable x is normally
distributed.*/
clear
/* The next line will
read in some real data. Change the path to find data */
use "C:\users\B.
Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\prestige.dta",
clear
/* Now let's print
the data and compute descriptive statistics. */
sum income education
prestige, detail
sktest
income education prestige
qnorm
income
qnorm
education
qnorm
prestige
/* Now, let's
estimate the relationship between income, education, and prestige using the
standard regression model, save the residuals, and test for normality.*/
reg
income education prestige
predict e, res
sum e, detail
sktest
e
qnorm
e
/* The residuals from
this regression appear skewed and kurtotic. However,
N is fairly small. Nevertheless, based on theory and the truncation of income
at zero we will assume some of the other distributions.*/
/* First, let's
assume a Generalized Gamma Distribution. This distribution encompasses a Gamma,
Exponential, and Weibull distribution with some
simple parameter restrictions, as shown below.
It also encompasses a lognormal distribution with some mathematical
manipulation. In the Generalized Gamma there are two distributional parameters
that we are labeling c and k.
Note that following
estimation we save the log likelihood for later hypothesis testing. */
set more off
program define gengam
version 7
args
lnf theta1 k c
# delimit ;
quietly replace `lnf'=
ln(`k')
-lngamma(`c')
+`c'*`k'*ln(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c')))
-ln($ML_y1)
+`c'*`k'*ln($ML_y1/`theta1')
-(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c'))
*($ML_y1/`theta1'))
^`k' ;
#delimit cr
end
ml model lf gengam (theta:income=education
prestige) /k /c
ml init 130.7 1.06
-1.38 1 1, copy
ml maximize
scalar llgg=e(ll)
display llgg
program drop gengam
/* Now let's assume a
Gamma Distribution. This distribution is true when k=1 in the preceding
Generalized Gamma Disribution.*/
set more off
program define gamma
version 7
args
lnf theta1 c
# delimit ;
quietly replace `lnf'=-lngamma(`c')-ln($ML_y1)+`c'*ln($ML_y1/`theta1')-($ML_y1/`theta1')
;
#delimit cr
end
ml model lf
gamma (theta:income=education
prestige) /c
ml init 130.7 1.06
-1.38 1, copy
ml maximize
scalar llg=e(ll)
display llg
program drop gamma
/* Now let's assume a
Weibull Distribution. This distribution is true when
c=1 in the *preceding Generalized Gamma Distribution.*/
set more off
program define weibull
version 7
args
lnf theta1 k
# delimit ;
quietly replace `lnf'=
ln(`k')
+`k'*ln(exp(lngamma(1+(1/`k'))))
-ln($ML_y1)
+`k'*ln($ML_y1/`theta1')
-((exp(lngamma(1+(1/`k'))))
*($ML_y1/`theta1'))
^`k' ;
#delimit cr
end
ml model lf weibull (theta:income=education prestige) /k
ml init 130.7 1.06
-1.38 1, copy
ml maximize
scalar llw=e(ll)
display llw
program drop weibull
* Now let's assume an
Exponential Distribution. This distribution is true when c=k=1 in *the preceding
Generalized Gamma Distribution.
set more off
program define expon
version 7
args
lnf theta1
quietly replace `lnf'=-ln(`theta1')-($ML_y1/`theta1')
end
ml model lf expon (theta:income=education
prestige)
ml init 130.7 1.06
-1.38, copy
ml maximize
scalar lle=e(ll)
display lle
program drop expon
/* Model
Discrimination: Choosing the correct distribution is more of an art than a
science. Of course, as we noted above, it would be best if there were a basis in
substantive theory for the chosen model. Lacking this alternative, we can
construct hypothesis tests based on Likelihood Ratio, Wald, or LaGrange
Multiplier principles. For example, using a Likelihood Ratio test we could test
each of the submodels against the Generalized Gamma
model. The likelihood ratio statistic is chi squared with degrees of freedom
equal to the number of parameter restrictions (1 for Generalized Gamma to Gamma
and Weibull; 2 for Generalized Gamma to Exponential).
*/
*Generalized Gamma to
Gamma
scalar lrggg=2*(llgg-llg)
display lrggg
*Generalized Gamma to
Weibull
scalar lrggw=2*(llgg-llw)
display lrggw
*Generalized Gamma to
Exponential
scalar lrgge=2*(llgg-lle)
display lrgge
/* We could also test
the Exponential Model against the Gamma and Weibull
models.
These likelihood
ratio tests are chi squared with one degree of freedom. */
* Weibull
to Exponential
scalar lrwe=2*(llw-lle)
display lrwe
* Gamma to
Exponential
scalar lrge=2*(llg-lle)
display lrge
/* On this basis we
choose the Gamma model above, since it is not statistically different than the
Generalized Gamma model and is more parsimonious (1 less parameter). However,
the Gamma model is statistically different than the Exponential which has one
less parameter. This suggests that the parameter restriction for the
Exponential model is inappropriate. Note that we would reach similar
conclusions by simply looking at the t statistics associated with the
probability distribution parameters of the four estimated models. */
/* Another
distribution that is related to the Generalized Gamma is the LogNormal. However, it is not obtained by a simple change
in parameterization as above. If a variable,y, is
lognormal, then its variance is proportional to the square of its mean. The LogNormal procedure is given below. */
set more off
program define lognorm
version 7
args
lnf theta1 sigma
# delimit ;
quietly replace `lnf'=-log(sqrt(2*_pi)*`sigma')-log($ML_y1)-(1/(2*`sigma'^2))*(log($ML_y1/(`theta1'))
+(`sigma'^2)/2)^2 ;
#delimit cr
end
ml model lf lognorm (theta:income=education
prestige) (sigma:)
ml search
ml maximize
program drop lognorm
/*
When the dependent
data are in proportions (i.e.,0<y<1 ) then a Beta Distribution is often
used. If the data are not in proportions then it is possible to put them into
proportions simply by dividing the dependent variable
by its maximum. Of
course, if you transform the data just to use the Beta, then the scaling and
interpretation of the coefficients will be different, so it is probably better to
just use the Beta with variables that are naturally proportions unless you have
a good reason to want the Beta. The Beta istribution
is perhaps the most flexible distribution. Look again at the Excel spreadsheet
for the Beta to get a sense of just how flexible it is. Of course, this
flexibility may be its major drawback since it is often difficult to justify on
the basis of substantive theory. (The above distributions may be also.) Here is
the Beta distribution applied to cross-national data on infant mortality per
1000 as a function of national income and whether the nation is oil exporting.
*/
clear
/* The next line will
read in some real data. Change the path to find data */
use "C:\users\B.
Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\Leinhardt.dta",
clear
/* First, create the
rate of infant mortality from the data. This can be thought of as the
probability of one infant death per 10000draws. This probability naturally
ranges from 0 to 1.*/
gen
infantmort = infant / 1000
gen
infantodds = exp(infantmort)/(1+exp(infantmort))
/* Now let's print
the data and compute descriptive statistics. */
sum infant infantodds income oilyes, detail
sktest
infantodds
qnorm
infantodds
/* Run a regression
to get some starting values */
reg
infantodds income oilyes
/*Write down the
function to be optimized
set more off
program define beta
version 7
args
lnf theta1 p
# delimit ;
quietly replace `lnf'=lngamma(`p')
-lngamma(`p'*`theta1')
-lngamma((1-`theta1')*`p')
+(`theta1'*`p'-1)*ln($ML_y1)
+((1-`theta1')*`p'-1)*ln(1-$ML_y1) ;
#delimit cr
end
ml model lf beta (theta:infantodds=income oilyes) (p:)
ml init -5e-06 .02
.53 3, copy
ml maximize
program drop beta
/* Note that obtaining
a good set of starting values is crucial to convergence or even initial
iteration. The OLS start values are recommended by Ferrari and Cribari-Neto (1994). They seem to work here.*/
/*Now that I've put
you through the pain of programming your own log likelihood functions for these
continuous non-normal distributions, I want to point out that many of these
distributions are available as canned procedures, and as variants of the
Generalized Linear Model features of STATA.
Generalized Linear Models are a framework that encompasses many models
that can be estimated with maximum likelihood.
It is a very flexible framework presenting a unified way to view many
families of distributions and diverse link functions. However, GLM is beyond the scope of this
course Here are examples of STATA's canned non-normal
procedures. */
/* This first one is
beta regression and requires the installation of a package called betafit */
betafit
infantmort, muvar(income oilyes)
clear
/* The next line will
read in the earlier data. Change the path to find data */
use "C:\users\B.
Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\prestige.dta",
clear
gamma income
education prestige
weibull
income education prestige
ereg
income education prestige
/*As an example of
GLM, here is the syntax for estimating the exponential regression model we did
earlier.
Note that standard
errors are radically different from above because of the different methods of
estimating the covariance matrix of estimates.
However, the log likelihood and coefficient estimates are quite
close. */
glm
income education prestige, fam(gamma) link(id)
exit