MODELS WITH NON-NORMAL DISTURBANCES

 

The purpose of this session is to show you how to estimate and test for models with non-normal disturbances.  In regression applications where the dependent variable is strictly positive, the assumption of normally distributed errors may be inappropriate. Other applications may involve truncation or censoring at the high end, or on both ends. The MAXIMIZE command and various "canned" procedures in LIMDEP provide the ability to estimate linear regression models with a range of distributional assumptions.

 

Of course, the starting point in moving to an alternative estimator is theory. We should always define the nature of the statistical experiment that produced the sample. Ideally, there should be a close match between the data attributes and the attributes of the probability distribution chosen for estimation. For example, if the data are always positive but unbounded on the upper end, then one of the distributions derived from the Generalized Gamma Function might be appropriate. If the data are a logged form of a Cobb-Douglas type function, then we might suspect a LogNormal distribution of disturbances. The Beta distribution offers the widest range of possible forms, but this might be a limitation if we want to justify the use of a distribution based on substantive theory. Ideally, we would like to be able to derive the appropriate distribution based on substantive theory. However, this may not always be possible. In this exercise, we show how to implement an empirical strategy for choosing an appropriate non-normal distribution.

 

/* This file demonstrates how to test and estimate regression models that have non-normal disturbances. We consider a range of models including the Generalized Gamma, Gamma, Exponential, Weibull, Beta, and LogNormal distributions. */

 

Reset $

 

/* Below, let's create some simulated data that we know has a normal distribution. Then compute descriptives as above on the simulated data. Compare the skewness, kurtosis, and quantile plots to the results above */

 

Sample ; 1-1000 $

Create ; normdata = rnn(0,1) $

Dstats ; rhs=normdata ; quantiles ; plot ; all $

 

/* Note that the skew and kurtosis statistics, along with the quantile plots provide evidence concerning the distribution of the data, relative to the normal distribution. The skewness statistic measures the symmetry of the distribution, while the kurtosis statistic measures the relative peakedness of the distibution compared to a normal distribution. A perfectly symmetrical distribution would have skewness=0; a non-kurtotic distribution would have kurtosis=3 (Note: some versions of kurtosis are centered on 0, rather than 3, by subtracting out the 3. LIMDEP reports the version centered on 3.

 

The quantile plots compare quantiles of the actual data to quantiles of the same range drawn from a normal distribution. This provides yet another indicator of the normality of the data. Normal data should follow the line on the quantile plot. A sigmoid shape indicates kurtosis, while the slope of the plot relative to the line implies a particular direction of skew. Take care, however, of relying too much on these plots and statistics. They pertain to the sample, not the population from which the sample is drawn. Theory is the best guide, particularly with small N. */

 

Reset $

 

/* The next line will read in some real data. Change the path to find data */

 

Read ; format=xls; file="c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\prestige.xls"

; Names $

List ; educatio,income,women,prestige,census $

Sample; 1-102 $

 

/* Now let's print the data and compute descriptive statistics */

 

Dstat ; Rhs=income,educatio,prestige ; Quantiles ; Plot ; Output=3 ; All $

 

/* Now, let's estimate the relationship between consume, income, and price using the standard regression model, save the residuals, and test for normality */

 

Regress ; Lhs=income ; Rhs=One,educatio,prestige ; Res=e $

Dstats ; Rhs=e ; quantiles ; plot ; all $

 

/* The residuals from this regression appear skewed and kurtotic so we will assume some of the other distributions. */

 

/* First, let's assume a Generalized Gamma Distribution. This distribution encompasses a Gamma, Exponential, and Weibull distribution with some simple parameter restrictions, as shown below.  It also encompasses a lognormal distribution with some mathematical manipulation. In this version of the Generalized Gamma there are two distributional parameters that we are labeling c and k.  Note that this is a slightly different parameterization relative to that given in the lecture.  Relative to the lecture discussion of a Generalized Gamma with 4 parameters, we are restricting the location parameter to zero.  Note also that following estimation we save the log likelihood for later hypothesis testing.  */

 

/* Because of the complexity of the remaining function, I have used LIMDEP's ability to use subfunctions. The | symbol AFTER a subfunction identifies each subfunction. The last function in the list gives the function to be optimized. */

 

Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1

; alg=bfgs ; tlf=.0001 ; output=3 

; Fcn= bx = a0+a1*educatio+a2*prestige |

       cbx = income/bx           |

       cpk = c+(1/k)              |

       log(k) - lgm(c) + c*k*(lgm(cpk) -  lgm(c)  )

              - log(income) +c*k*log(cbx)

            -( exp(lgm(cpk)-lgm(c))*(cbx) 

              )^k$

CALC; lnLGG=Logl $

 

 

/* Now let's assume a Gamma Distribution. This distribution is true when k=1 in the preceding Generalized Gamma Disribution. We use the fix option to set the parameter to 1.*/

 

Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1

; alg=bfgs ; tlf=.0001 ; output=3 ; fix=k

; Fcn= bx = a0+a1*educatio+a2*prestige |

       cbx = income/bx           |

       cpk = c+(1/k)              |

       log(k) - lgm(c) + c*k*(lgm(cpk) -  lgm(c)  )

              - log(income) +c*k*log(cbx)

            -( exp(lgm(cpk)-lgm(c))*(cbx) 

              )^k$

 

Calc; lnLG=Logl $

 

/* Now let's assume a Weibull Distribution. This distribution is true when c=1 in the preceding Generalized Gamma Distribution. Again, we use the fix option to set the parameter to this value.*/

 

Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1

; alg=bfgs ; tlf=.0001 ; output=3 ; fix=c

; Fcn= bx = a0+a1*educatio+a2*prestige |

       cbx = income/bx           |

       cpk = c+(1/k)              |

       log(k) - lgm(c) + c*k*(lgm(cpk) -  lgm(c)  )

              - log(income) +c*k*log(cbx)

            -( exp(lgm(cpk)-lgm(c))*(cbx) 

              )^k$

Calc ; lnLW=Logl $

 

/* Now let's assume an Exponential Distribution. This distribution is true when c=k=1 in the preceding Generalized Gamma Distribution. */

 

Maximize ; labels=a0,a1,a2,c,k ; Start=-714,-169,199,1,1

; alg=bfgs ; tlf=.0001 ; output=3  ; fixe=c,k

; Fcn= bx = a0+a1*educatio+a2*prestige |

       cbx = income/bx           |

       cpk = c+(1/k)              |

       log(k) - lgm(c) + c*k*(lgm(cpk) -  lgm(c)  )

              - log(income) +c*k*log(cbx)

            -( exp(lgm(cpk)-lgm(c))*(cbx) 

              )^k$

Calc ; lnLE=Logl $

 

/* Model Discrimination: Choosing the correct distribution is more of an art than a science. Of course, as we noted above, it would be best if there were a basis in substantive theory for the chosen model. Lacking this alternative, we can construct hypothesis tests based on Likelihood Ratio, Wald, or LaGrange Multiplier principles.

 

For example, using a Likelihood Ratio test we could test each of the submodels against the Generalized Gamma model. The likelihood ratio statistic is chi squared with degrees of freedom equal to the number of parameter restrictions (1 for Generalized Gamma to Gamma and Weibull; 2 for Generalized Gamma to Exponential) */

 

Calc ; List ; LRGGG=2*(lnLGG-lnLG) $ ? Generalized Gamma to Gamma

Calc ; List ; LRGGW=2*(lnLGG-lnLW) $ ? Generalized Gamma to Weibull

Calc ; List ; LRGGE=2*(lnLGG-lnLE) $ ? Generalized Gamma to Exponential

 

/* We could also test the Exponential Model against the Gamma and Weibull models. These likelihood ratio tests are chi squared with one degree of freedom. */

 

Calc ; List ; LRWE=2*(lnLW-lnLE) $ ? Weibull to Exponential

Calc ; List ; LRGE=2*(lnLG-lnLE) $ ? Gamma to Exponential

 

/* On this basis we choose the Gamma model above, since it is not statistically different than the Generalized Gamma model and is more parsimonious (1 less parameter). However, the Gamma model is statistically different than the Exponential which has one less parameter. This suggests that the parameter restriction for the Exponential model is inappropriate.

 

Note that we would reach similar conclusions by simply looking at the t statistics associated with the probability distribution parameters of the four estimated models. */

 

/* Another distribution that is related to the Generalized Gamma is the LogNormal. However, it is not obtained by a simple change in parameterization as above. If a variable,y, is lognormal, then its variance is proportional to the square of its mean. The LogNormal is given. */

 

Maximize ; labels=a0,a1,a2,Sig

; Start=-714,-169,199,1

; Fcn=

-log(sqr(2*pi)*Sig)-log(income)-(1/(2*Sig^2))

*(log(income/(a0+a1*educatio+a2*prestige))+(Sig^2)/2)^2

; alg=bfgs ; tlf=.0001 ; output=3$

 

/* The LogNormal Distribution is a "canned" procedure in LIMDEP. Call the LogNormal MLE as follows. */

 

Lognormal ; Lhs=income ; Rhs=One,educatio,prestige $

 

/*

When the dependent data are in proportions (i.e.,0<y<1 ) then a Beta Distribution is often used. If the data are not in proportions then it is possible to put them into proportions simply by dividing the dependent variable

by its maximum. Of course, if you transform the data just to use the Beta, then the scaling and interpretation of the coefficients will be different, so it is probably better to just use the Beta with variables that are naturally proportions unless you have a good reason to want the Beta. The Beta Distribution is perhaps the most flexible distribution. Look again at the Excel spreadsheet for the Beta to get a sense of just how flexible it is. Of course, this flexibility may be its major drawback since it is often difficult to justify on the basis of substantive theory. (The above distributions may be also.) Here is the Beta distribution applied to cross-national data on infant mortality per 1000 as a function of national income and whether the nation is oil exporting.

*/

 

reset $

Read ; format=xls ; File = "c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\Leinhardt.xls"

; Names $

List ; infant,income,oilyes $

Sample; 1-101 $ 

 

/* First, create the rate of infant mortality from the data. This can be thought of as the probability of one infant death per 1000 draws. This probability naturally ranges from 0 to 1.*/

 

create ; infantmr = infant / 1000 $

create ; infantod = exp(infantmr)/(1+exp(infantmr)) $

 

/* First run a linear regression to obtain starting values for reasons to be made clear below. */

 

Regress ; lhs=infantod ; rhs=one,income,oilyes $

Maximize ; labels=a0,a1,a2,p

; Start=B,3

; Fcn=

Xb = a0+a1*income+a2*oilyes     |

 lgm(p)

-lgm(Xb*p)

-lgm((1-xB)*P)

+(Xb*p-1)*log(infantod)

+log(1-infantod)*((1-Xb)*p-1)

; alg=bfgs

; tlf=.00001 $

 

 

/* Note that obtaining a good set of starting values is crucial to convergence or even initial iteration. B above in the vector of starting values is the vector of coefficients from the prior regression. In general this is a good way to obtain starting values for MLE procedures. See also the article by Ferrari and Cribari-Neto (2004) */

 

/*Now that I've put you through the pain of programming your own log likelihood functions for these continuous non-normal distributions in LIMDEP, I want to point out that many of them are available as canned procedures in LIMDEP using the LOGLINEAR models.  The parameterization is different under these models.  Specifically, the relationship between the two models is xb=1/ln(xb).

Here are some examples of the canned procedures.

*/

 

 

reset $

 

Read ; format=xls; file="c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\prestige.xls"

; Names $

List ; educatio,income,women,prestige,census $

Sample; 1-102 $

 

loglinear ; lhs=income ; rhs=one,educatio,prestige ; model=gamma $

 

loglinear ; lhs=income ; rhs=one,educatio,prestige ; model=weibull $

 

loglinear ; lhs=income ; rhs=one,educatio,prestige ; model=exponential $

 

 

/* There is also a procedure for the beta distribution. However, it seems to require a long time to converge with these data, if ever. So you might not want to run this one. */

 

reset $

 

 

/* There is also a procedure for the beta distribution. However, it is different parameterization than the one above. It is a two parameter distribution which assumes each parameter is a function of the conditional mean. Also, this

procedure seems to take a long time to converge with these data, if ever. So you might not want to run this one. */

 

reset $

Read ; format=xls ; File = "c:\users\B. Dan Wood\My Documents\My Teaching\Maximum Likelihood\Data\Leinhardt.xls"

; Names $

Sample; 1-101 $   

create ; infantmr = infant / 1000 $

create ; infantod = exp(infantmr)/(1+exp(infantmr)) $

 

 

loglinear ; lhs=infantod ; rhs=one,income,oilyes ; model = beta $

 

 

Delete ; * $

 

Stop $