Censoring and Truncation
The
purpose of this session is to show you how to use STATA's procedures for doing
censored and truncated regression. We also estimate Heckman's two-stage
procedure for samples with selection bias which is a form of incidential
truncation.
/*
This file demonstrates some of STATA's procedures for doing censored and
truncated regression. In particular, we estimate a lower limit censored regression
(i.e., Tobit), Cragg's model that assumes a heterogenous censoring process,
Heckman's incidential truncation model for dealing with sample selection bias,
and truncated regression.*/
/*We
will use some of the Mroz data on female labor force participation and income
for these examples. The first 428 observations of the Mroz data contained women
who worked in 1975. The remaining 345 bservations contained women who did not
work. We will use only the first 50 observations from each of these subsets of
the data.*/
use
"c:\documents and settings\b. dan wood\my documents\my teaching\maximum
likelihood\data\tobit.dta", clear
summarize
/*
Now let's estimate a Tobit model and also save the log likelihood for later
testing. The dependent variable (WHRS)is the wife's hours worked in 1975. The
independent variables are a constant, number of children less than 6 years old
(KL6), number of children between 6 and 18 (KL618), wife's age (WA), and wifes
education (WE). */
tobit
whrs kl6 k618 wa we, ll(0)
scalar
ltobit=e(ll)
display
ltobit
/*When
Using STATA, you must specify what is being censored. The command",
ll(0)" tells STATA that the lower limit is being set to zero*/
tobit
whrs kl6 k618 wa we, ll(0)
fitstat /*Obtain
various fit statistics on the poisson regression*/
listcoef,
help /*List
the coefficients and standardized coefficients*/
prvalue,
x(kl6=2) rest(mean) /*Compute
probabilities when kl6=2 and rest at mean */
/*
McDonald and Moffit suggest a useful decomposition of the marginal effects
associated with the censored regression model. They show that a change in the
conditional mean due to right side variables derives from two sources:
1)
It affects the conditional mean in the uncensored part of the distribution
2)
It affects the conditional mean by also affecting the probability that an
observation will lie in the uncensored part of the distribution.
Below
we utilize the downloadable procedure "dtobit2" that takes into
account this possible difference in marginal effects. Note that the marginal
effect you should report depends on the use you intend to make of the study. If
your primary interest is in the uncensored part of the distribution, then
report the uncensored marginal effect. Otherwise, report the entire marginal effect.*/
dtobit2
whrs kl6 k618 wa we, ll(0) /*Marginal
effects for latent variable, unconditional expected value, and marginal effect
conditional on being uncensored.*/
/*
Cragg has suggested that assuming a censoring limit that depends on the same
distribution as the uncensored observations is often incorrect. He suggests a
two equation system in which the first equation estimates the probability of
being above the censoring limit and the second is a truncated regression on the
uncensored observations. Below we estimate Cragg's model using Probit and
STATA's Truncated regression procedure. Also, we do a likelihood ratio test of
whether Cragg's model is significantly different than the Tobit model.
*/
probit
lfp kl6 k618 wa we
scalar
lprobit=e(ll)
display
lprobit
keep
if lfp==1
truncreg
whrs kl6 k618 wa we, ll(0)
scalar
ltrunc=e(ll)
display
ltrunc
scalar
lrtest=2*((lprobit+ltrunc)-ltobit)
display
lrtest
/*
The restricted model is Tobit. The unrestricted model is the two models
estimated separately. The test statistic is 14.4, which is chi-squared with 5
degrees of freedom for the number of additional parameters being estimated. The
critical value is 11.07, so we reject the null that the restricted model is true. The two equation
approach is therefore more appropriate than Tobit. */
/*STATA
also allows estimating models where the data are censored within specific
intervals (intreg). The censoring can also vary instead of being restricted to
a single value.(cnreg). We will not illustrate those procedures here. */
/*
Now let's turn to estimating a model with sample selection bias. In these cases
the truncation is incidental, due to sample selection
on
another variable that is correlated with the truncation in the dependent
variable. As discussed in class, the standard model is Heckman's two stage
procedure. Here is an example. First, read in the data again. */
use
"c:\documents and settings\b. dan wood\my documents\my teaching\maximum
likelihood\data\tobit.dta", clear
/*
Now estimate the two stage sample selection model using Heckman's procedure.*/
heckman
whrs kl6 k618 wa we, select(lfp=cit kl6) twostep
/*
The probit equation estimates an index, the inverse Mills ratio that attempts
to measure the omitted variable in the equation for the incidentially truncated
variable in the second equation. This variable is inserted as an additional
variable in the second equation. */