MODELS WITH NON-NORMAL
DISTURBANCES
The
purpose of this session is to show you how to estimate and test for models with
non-normal disturbances. In regression applications
where the dependent variable is strictly positive, the assumption of normally
distributed errors may be inappropriate. Other applications may involve
truncation or censoring at the high end, or on both ends. The “optim” procedure and various "canned" procedures
in R provide the ability to estimate linear regression models with a range of
distributional assumptions.
Of
course, the starting point in moving to an alternative estimator is theory. We
should always define the nature of the statistical experiment that produced the
sample. Ideally, there should be a close match between the data attributes and
the attributes of the probability distribution chosen for estimation. For
example, if the data are always positive but unbounded on the upper end, then
one of the distributions derived from the Generalized Gamma Function might be
appropriate. If the data are a logged form of a Cobb-Douglas type function,
then we might suspect a LogNormal
distribution of disturbances. The Beta distribution offers the widest range of
possible forms, but this might be a limitation if we want to justify the use of
a distribution based on substantive theory. Ideally, we would like to be able
to derive the appropriate distribution based on substantive theory. However, this
may not always be possible. In this exercise, we show how to implement an
empirical strategy for choosing an appropriate non-normal distribution.
# This
file demonstrates how to test and estimate regression models that have
non-normal disturbances. We
# consider a range of
models including the Generalized Gamma, Gamma, Exponential, Weibull,
Beta, and
# LogNormal distributions.
# Below,
let's create some simulated data that we know has a normal distribution. Then
compute
# descriptives on the simulated data, including skewness, kurtosis, and quantile
plots.
normdata <- rnorm(1000)
mean(normdata)
var(normdata)
sd(normdata)
median(normdata)
min(normdata)
max(normdata)
library(e1071)
skewness(normdata)
kurtosis(normdata)
# Note that the skew and
kurtosis statisticsprovide evidence concerning the
distribution
# of
the data, relative to the normal distribution. The skewness
statistic measures the
# symmetry
of the distribution, while the kurtosis statistic measures the relative peakedness
# of the distibution compared to a normal distribution. A perfectly
symmetrical distribution
# would have skewness=0; a non-kurtotic
distribution would have kurtosis=3 (However, this library
# in R centers the
kurtosis statistic on 0 by subtracting out the 3.
# Now
let's do a quantile plot of the simulated data.
library(car)
qq.plot(normdata,
dist= "norm", labels=FALSE)
# Quantile
plots compare quantiles of the actual data to quantiles of the same range drawn from
# a
normal distribution. This provides yet another indicator of the normality of
the data.
# data
should follow the line on the quantile plot. A
sigmoid shape indicates kurtosis, while the
# slope
of the plot relative to the line implies a particular direction of skew. Take
care, however,
# of
relying too much on these plots and statistics. They pertain to the sample,
# not
the population from which the sample is drawn. Theory is the best guide,
# particularly
with small N where such plots may not be reliable.
# The
next set of commands will read in some real data. Change the path to find the
data.
data(Prestige, package="car")
attach(Prestige)
summary(Prestige)
# Now
let's list the data, compute descriptive statistics, and do a qq plot for each variable.
list(Prestige)
mean(income)
var(income)
sd(income)
median(income)
min(income)
max(income)
skewness(income)
kurtosis(income)
qq.plot(income,
dist= "norm", labels=FALSE)
mean(education)
var(education)
sd(education)
median(education)
min(education)
max(education)
skewness(education)
kurtosis(education)
qq.plot(education,
dist= "norm", labels=FALSE)
mean(prestige)
var(prestige)
sd(prestige)
median(prestige)
min(prestige)
max(prestige)
skewness(prestige)
kurtosis(prestige)
qq.plot(prestige,
dist= "norm", labels=FALSE)
# Now,
let's estimate the relationship between income, education, and prestige using
the standard
# regression
model and test for normality
ols.model <- lm(income ~ education
+ prestige)
summary(ols.model)
skewness(residuals(ols.model))
kurtosis(residuals(ols.model))
qq.plot(residuals(ols.model), dist= "norm", labels=FALSE)
# histogram
and density plots
hist(residuals(ols.model), nclass=n.bins(residuals(ols.model)),
probability=T,
ylab='Density')
lines(density(residuals(ols.model)),
lwd=2)
points(residuals(ols.model), rep(0,
length(residuals(ols.model))), pch="|")
box()
lines(density(residuals(ols.model),
adjust=.5), lwd=1)
# The
residuals from this regression appear somewhat skewed and kurtotic.
Based on theory and the
# truncation
of income at zero we will assume some of the other distributions.
# First,
let's assume a Generalized Gamma Distribution. This distribution encompasses a Gamma,
#
Exponential, and Weibull distribution with some simple parameter
restrictions, as shown below.
# It
also encompasses a lognormal distribution with some mathematical manipulation,
# In
this version of the Generalized Gamma there are two distributional parameters
that we are
# labeling
c and k. Note that this is a slightly
different parameterization relative to that
# given
in the lecture. relative to the
lecture discussion of a Generalized Gamma with 4 parameters,
# we
are restricting the location parameter to zero.
Note also that following
# estimation
we save the log likelihood for later hypothesis testing. */
# Because of the
complexity of the function, I have used R's ability to use subfunctions.
# First,
put the data into matrices for the MLE procedure
x <- cbind(1,education,prestige)
y <- as.matrix(income)
ones <- as.vector(x[,1])
# Calculate
K (columns in x), define K1 and
K <- ncol(x)
K1 <- K+1
n <- nrow(x)
# Define the function to
be optimized
llik.ggamma <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta
<- X%*%par[1:K]
c <- par[K1:K1]
k <- par[K2:
cbx
<- y/xbeta
cpk
<- c+(1/k)
sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) - lgamma(c) )
- log(Y) +c*k*log(cbx)
-( exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.ggamma
# Now
let's use the above function to estimate the model.
# Let's
use as starting values the estimates from OLS augmented by potential gamma
parameters.
startvalues <- c(coefficients(ols.model),1,1)
startvalues
mod.ggamma <- optim(startvalues,llik.ggamma, method =
"BFGS", control = list(trace,maxit=1000,fnscale
= -1),
hessian
= TRUE)
mod.ggamma
# Save the log likelihood
for later use below
llggamma <- mod.ggamma$value
llggamma
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.ggamma$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.ggamma$par
zstat
<-b/se
# Calculate
p values from the z statistics
pz
<- 2*pnorm( -abs(zstat))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now
let's assume a Gamma Distribution. This distribution is true when k=1 in the
preceding
# Generalized Gamma Disribution. We plug this into the preceding function for
the Gamma
# Define the function to
be optimized
llik.gamma <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta
<- X%*%par[1:K]
c <- par[K1:K1]
k <- 1
cbx
<- y/xbeta
cpk
<- c+(1/k)
sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) - lgamma(c) )
- log(Y) +c*k*log(cbx)
-( exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.gamma
# Now
let's use the above function to estimate the model.
startvalues <- c(coefficients(ols.model),1)
startvalues
mod.gamma <- optim(startvalues,llik.gamma, method =
"BFGS", control = list(trace,maxit=1000,fnscale
= -1),
hessian
= TRUE)
mod.gamma
# Save the log likelihood
for later use below
llgamma <- mod.gamma$value
llgamma
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.gamma$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.gamma$par
zstat
<-b/se
# Calculate
p values from the z statistics
pz
<- 2*pnorm( -abs(zstat))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now
let's assume a Weibull Distribution. This
distribution is true when c=1 in the preceding
# Generalized Gamma
Distribution. Again,plug
this into the function for the generalized gamma.
# Define the function to
be optimized
llik.weibull <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta
<- X%*%par[1:K]
c <- 1
k <- par[K1:K1]
cbx
<- y/xbeta
cpk
<- c+(1/k)
sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) - lgamma(c) )
- log(Y) +c*k*log(cbx)
-( exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.weibull
# Now
let's use the above function to estimate the model.
startvalues <- c(coefficients(ols.model),1)
startvalues
mod.weibull <- optim(c(130.7,1.06,-1.38,10),llik.weibull,
method = "BFGS", control = list(trace,maxit=1000,fnscale
= -1),
hessian
= TRUE)
mod.weibull
# Save the log likelihood
for later use below
llweibull <- mod.weibull$value
llweibull
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.weibull$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.weibull$par
zstat
<-b/se
# Calculate
p values from the z statistics
pz
<- 2*pnorm( -abs(zstat))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now
let's assume an Exponential Distribution. This distribution is true when c=k=1
# in
the preceding Generalized Gamma Distribution.
# Define the function to
be optimized
llik.exponential <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta
<- X%*%par[1:K]
c <- 1
k <- 1
cbx
<- y/xbeta
cpk
<- c+(1/k)
sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) - lgamma(c) )
- log(Y) +c*k*log(cbx)
-( exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.exponential
# Now
let's use the above function to estimate the model.
startvalues <- c(coefficients(ols.model))
startvalues
mod.exponential <- optim(c(130.7,1.06,-1.38),llik.exponential,
method = "BFGS", control = list(trace,maxit=1000,fnscale
= -1),
hessian
= TRUE)
mod.exponential
# Save the log likelihood
for later use below
llexponential <- mod.exponential$value
llexponential
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.exponential$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.exponential$par
zstat
<-b/se
# Calculate
p values from the z statistics
pz
<- 2*pnorm( -abs(zstat))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Model Discrimination:
Choosing the correct distribution is more of an
# art
than a science. Of course, as we noted above, it would be
# best
if there were a basis in substantive theory for the chosen model.
# Lacking
this alternative, we can construct hypothesis tests based
# on
Likelihood Ratio, Wald, or LaGrange Multiplier principles.
# For
example, using a Likelihood Ratio test we could test each of the
# submodels
against the Generalized Gamma model. The likelihood ratio
# statistic
is chi squared with degrees of freedom equal to the number
# of
parameter restrictions (1 for Generalized Gamma to Gamma and Weibull;
# 2 for Generalized Gamma
to Exponential)
# Likelihood Ratio,
Generalized Gamma to Gamma
llggamma
llgamma
llweibull
lrggg
<- 2*(llggamma-llgamma)
lrggg
lrgggpvalue <- 1-pchisq(lrggg, df=1)
lrgggpvalue
# Likelihood Ratio,
Generalized Gamma to Weibull
lrggw
<- 2*(llggamma-llweibull)
lrggw
lrggwpvalue <- 1-pchisq(lrggw, df=1)
lrggwpvalue
# Likelihood Ratio,
Generalized Gamma to Exponential
lrgge
<- 2*(llggamma-llexponential)
lrgge
lrggepvalue <- 1-pchisq(lrgge, df=2)
lrggepvalue
# We
can also test the Exponential Model against the Gamma and Weibull
models.
# These
likelihood ratio tests are chi squared with one degree of freedom.
# Likelihood Ratio, Weibull to Exponential
lrwe
<- 2*(llweibull-llexponential)
lrwe
lrwepvalue <- 1-pchisq(lrwe, df=1)
lrwepvalue
# Likelihood Ratio, Gamma
to Exponential
lrge
<- 2*(llgamma-llexponential)
lrge
lrgepvalue <- 1-pchisq(lrge, df=1)
lrgepvalue
# On
this basis we choose the Gamma model, since it is not statistically different
# than
the Generalized Gamma model and more parsimonious (1 less parameter). The Weibull model
# is different, which
suggests the need for the additional parameter in generalized gamma.
# The
exponential model breaks down in this case, not allowing further
discrimination,
# but
also suggesting the inappropriateness of the exponential.
# Another
distribution that is related to the Generalized Gamma is the LogNormal.
# However,
it is not obtained by a simple change in parameterization as above. If a
# variable,y, is lognormal, then its variance is
proportional to the square of its mean.
# Here
is the setup for estimating the lognormal distribution from the preceding data.
llik.lognormal <- function(par,X,Y) {
Y <- as.vector(y)
X <- as.matrix(x)
xbeta
<- X%*%par[1:K]
Sig <- par[K1:K1]
sum(-log(sqrt(2*pi)*Sig)-log(Y)-(1/(2*Sig^2))*(log(Y/xbeta)+(Sig^2)/2)^2)
}
llik.lognormal
# Now
let's use the above function to estimate the model.
startvalues <- c(coefficients(ols.model))
startvalues
mod.lognormal <- optim(c(130.7,1.06,-1.38,1),llik.lognormal,
method = "BFGS", control = list(trace,maxit=100,fnscale
= -1),
hessian
= TRUE)
mod.lognormal
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.lognormal$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.lognormal$par
zstat
<-b/se
# Calculate
p-values for the z statistics
pzstat <- 2* (1 - pnorm(abs(zstat)))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pzstat)
table
# When the dependent data
are in proportions (i.e.,0<y<1 ) then a
# Beta Distribution is
often used. If the data are not in proportions
# then
it is possible to put them into proportions simply by dividing
# the
dependent variable by its maximum. Of course, if you transform
# the
data just to use the Beta, then the scaling and interpretation of
# the
coefficients will be different, so it is probably better to just
# use the Beta with
variables that are naturally proportions unless you have
# a
good reason to want the Beta. The Beta Distribution is perhaps the most
# flexible
distribution. Look again at the Excel spreadsheet for the Beta to
# get
a sense of just how flexible it is. Of course, this flexibility may be
# its
major drawback since it is often difficult to justify on the basis of
# substantive
theory. (The above distributions may be also.) Here is the Beta
# distribution
applied to cross-national data on infant mortality per 1000 as
# a
function of national income and whether the nation is oil exporting.
Leinhardt <- read.table("c:/users/B. Dan Wood/My Documents/My Teaching/Maximum
Likelihood/Data/Leinhardt.txt", header=TRUE)
attach(Leinhardt)
summary(Leinhardt)
# First,
create the rate of infant mortality from the data. This can be thought of
# as
the probability of one infant death per 1000 draws. This probability naturally
# ranges from 0 to 1.
infantmort <- infant/1000
# We
need this next function to get starting values for the MLE. The link function
is
# usually specified as an
odds
infantodds <- exp(infantmort)/(1+exp(infantmort))
# Create a vector of ones
for inclusion on the ml procedure
ones <- matrix(1:1,101,1)
# Now lets just do an ols to get start values and for comparison.
ols.model <- lm(infantodds
~ income + oilyes)
summary(ols.model)
library(e1071)
skewness(residuals(ols.model))
kurtosis(residuals(ols.model))
qq.plot(residuals(ols.model), dist= "norm", labels=FALSE)
hist(residuals(ols.model), nclass=n.bins(residuals(ols.model)),
probability=T,
ylab='Density')
lines(density(residuals(ols.model)),
lwd=2)
points(residuals(ols.model), rep(0,
length(residuals(ols.model))), pch="|")
box()
lines(density(residuals(ols.model),
adjust=.5), lwd=1)
# Define the function to
be optimized
llik.beta <- function(par,ONES,X1,X2,YP)
{
ONES <- as.vector(ones)
X1 <- as.vector(income)
X2 <- as.vector(oilyes)
YP <- as.vector(infantmort)
a0 <- par[1:1]
a1 <- par[2:2]
a2 <- par[3:3]
xbeta
<- a0*ONES + a1*X1 + a2*X2
p <- par[4:4]
mu
<- exp(xbeta)/(1+exp(xbeta))
sum(lgamma(p)-lgamma(mu*p)-lgamma((1-mu)*p)+(mu*p-1)*log(YP)+((1-mu)*p-1)*log(1-YP))
}
llik.beta
# Now
let's use the above function to estimate the model. Use coefficients from the
original
# regression
as starting values, augmented by a 3. Why 3?
startvalues <- c(coefficients(ols.model),3)
startvalues
mod.beta <- optim(c(startvalues),llik.beta,
method = "BFGS", control = list(trace,maxit=1000,fnscale
= -1),
hessian
= TRUE)
mod.beta
# Calculate the variance
matrix from the Hessian matrix.
v <- -solve(mod.beta$hessian)
# Calculate the standard
errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z
statistics from the coefficients and standard errors
b <- mod.beta$par
zstat
<-b/se
# Calculate
p values from the z statistics
pz
<- 2*pnorm( -abs(zstat))
# Put
the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Note that obtaining a
good set of starting values is crucial to convergence or even
# initial
iteration. B above in the vector of starting values is the vector of
coefficients
# from
the prior regression. Ferrari and Cribari-Neto (2004)
recommend this as a good way
# to specify start values
in beta regression. Note that the coefficients from this regression
# are interpreted as the
log of the odds ratio between the new level and old level associated
# with
the change in x.
#You
might also want to try the user defined canned procedure called betareg.
library(betareg)
betareg.out <- betareg(infantmort ~ income + oilyes, link = "logit")
summary(betareg.out)