PROBABILITY
DISTRIBUTIONS
AND
ESTIMATING A MEAN AND VARIANCE USING MLE
The purpose of this session is to
familiarize you with some of the more important probability distributions and
to take an initial step in understanding how probability distributions relate
to likelihood and log likelihood functions.
PROBABILITY DISTRIBUTIONS: Recall
that a probability distribution is a mapping from a random variable (call it Y)
to the probability of having observed that random variable (P[Y]). Maximum
Likelihood Estimation (MLE) involves specifying a probability distribution that
describes the social science experiment that generated the random variable
under investigation. Of course, the data generation process may have been
consistent with more than one probability distribution since it is possible to
derive some distributions from others.
The first part of this assignment is to go
to Microsoft Excel and call up various spreadsheets that plot some of the more
interesting univariate probability distributions. You
should have the following spreadsheets for this exercise.
Normal Distribution
Chi-Square Distribution
Bernoulli Distribution
Binomial Distribution
Negative Binomial Distribution
Poisson Distribution
Weibull Distribution
Exponential Distribution
Gamma Distribution
LogNormal Distribution
Beta Distribution
Note that there are two spreadsheets for the Gamma and Beta distributions to
illustrate the different ways in which these may be parameterized. In general,
there are many possible parameterizations of most probability distributions.
For each of these spreadsheets, the cells
marked green are areas where you may change the parameters and observe the
behavior of the resulting distributions. (Note: You may need to resize or
reposition the graphs because of the different settings and different versions
of Excel that are around.) In any case, play with each of these spreadsheets,
noting the essential features of each probability distribution, including the
range of the random variable, the range of the parameters, and the effect of
the parameters on means, modes, skew, kurtosis, and shape. The Evans, Hastings,
and Peacock text contains many more details on each of these distributions, so
you might use this reference in combination with the spreadsheets to learn more
about the probability functions. As an exercise, you might also try on your own
to graph one or more of the other probability distributions in the Evans,
The probability distributions in these spreadsheets are univariate
distributions, meaning that there is a single random variable in the domain of
the probability function. However, social scientists are also sometimes
interested in the joint probability associated with multiple random variables.
For example, we sometimes assume a bivariate normal
distribution when there are two dependent variables, both normal, to be modeled
simultaneously. Another example, the multinomial distribution is a discrete
joint distribution with dimensions equal to the number of categories in the multinomial
variable. Excel doesn't do so well in plotting multivariate distributions,
particularly when they are continuous. However, Maple can do the job. Use Maple
to look at the bivariate normal distribution
contained in the file called Bivariate Normal.mws. Change r (the correlation between the two
random variables), s1, s2, m1, and m2
to observe the effect on the distribution.
ESTIMATING A MEAN AND VARIANCE OF A
DISTRIBUTION USING MLE: Maximum likelihood is purely and simply an estimation
technique. In practice, we specify a probability distribution that could have
generated the data, put that probability distribution into a likelihood and
log-likelihood function, and then estimate the parameters of that distribution
using MLE. One approach to implementing MLE would be through trial and error.
As an example of how this might work, go to the Excel spreadsheet entitled MLNormal Mu.xls. In this spreadsheet I have entered the
data on page 9 of Eliason in the second column. In
the first column I have a vector of initial guesses for the mean of the
distribution. In the third column is the log-likelihood associated with each
initial guess, assuming independent draws from a normal distribution. (Note: We
could have used any of the distributions above in computing the
log-likelihoods). The MLE estimate of the mean is just the guess that produces
the largest number in column 3. The graph plots the values of the
log-likelihood function in column 3 against the vector of guesses in column 1.
We can also look at the graph to find the maximum. If we want to increase the
precision of estimation, then we can change the vector of guesses to range from
say 1.81 through 1.9. Do this to get a sense of what happens to the
log-likelihoods and graph.
Of course, trial
and error methods are very inefficient and may also be quite cumbersome when
there are multiple parameters. Thus, a better way to do estimation is using the
methods of calculus or iterative techniques. In class we show how to optimize a
function using both analytical and numerical methods. Computers are very useful
in implementing the latter. Below is a short R program for finding the mean and
variance of a normally distributed variable using MLE.
# This file calculates
a mean using maximum likelihood.
# Read in the Ostrom data
Ostrom <- read.table("C:/Data/ostrom.dat",
header=TRUE)
attach(Ostrom)
#list the data
Ostrom
#Compute summary
statistics
summary(Ostrom)
# Put
the data into matrices for use in the MLE procedure
x <- cbind(1,as.matrix(
y <- as.matrix(US)
ones <- x[,1]
# Calculate
n for later use
n <- sum(ones)
n
# Define the function to
be optimized
llik.mean <- function(par,X,Y) {
Y <- as.vector(y)
X <- as.matrix(ones)
mu
<- X%*%par[1:1]
Sig <- par[2:2]
sum(-(1/2)*log(2*pi)-(1/2)*log(Sig^2)-(1/(2*Sig^2))*(y-mu)^2)
}
llik.mean
# Optimize the function
model <- optim(c(140,75),llik.mean, method = "CG", control = list(fnscale = -1),
hessian
= TRUE)
model
# Pick off the separate
coefficients from the model vector
# of
coefficients for later use
mu
<- model$par[1]
mu
Sig <- model$par[2]
Sig
# Calculate the
variance-covariance matrix of coefficients
# from
the Hessian matrix.
v <- -solve(model$hessian)
v
# Calculate the standard
errors from the variance-
# covariance
matrix of coefficients.
se <- sqrt( diag(v))
se
# Calculate the z
statistics from the coefficients and
# standard
errors
b <- model$par
b
z <-b/se
z
# Note that the estimate
of the error variance is biased by
# n/n-1
# we
can correct the preceding estimate of the standard
# deviation
as follows
Sig
Sigunb <- sqrt(n/(n-1)*Sig^2)
Sigunb