Censoring and Truncation

 

The purpose of this session is to show you how to use STATA's procedures for doing censored and truncated regression. We also estimate Heckman's two-stage procedure for samples with selection bias which is a form of incidential truncation.

 

/* This file demonstrates some of STATA's procedures for doing censored and truncated regression. In particular, we estimate a lower limit censored regression (i.e., Tobit), Cragg's model that assumes a heterogenous censoring process, Heckman's incidential truncation model for dealing with sample selection bias, and truncated regression.*/

 

/*We will use some of the Mroz data on female labor force participation and income for these examples. The first 428 observations of the Mroz data contained women who worked in 1975. The remaining 345 bservations contained women who did not work. We will use only the first 50 observations from each of these subsets of the data.*/

 

use "c:\documents and settings\b. dan wood\my documents\my teaching\maximum likelihood\data\tobit.dta", clear

summarize

 

/* Now let's estimate a Tobit model and also save the log likelihood for later testing. The dependent variable (WHRS)is the wife's hours worked in 1975. The independent variables are a constant, number of children less than 6 years old (KL6), number of children between 6 and 18 (KL618), wife's age (WA), and wifes education (WE).  */

 

tobit whrs kl6 k618 wa we, ll(0)

scalar ltobit=e(ll)

display ltobit

 

/*When Using STATA, you must specify what is being censored. The command", ll(0)" tells STATA that the lower limit is being set to zero*/

 

tobit whrs kl6 k618 wa we, ll(0)

fitstat                                                                            /*Obtain various fit statistics on the poisson regression*/

listcoef, help                                                                 /*List the coefficients and standardized coefficients*/

prvalue, x(kl6=2) rest(mean)                                         /*Compute probabilities when kl6=2 and rest at mean */

 

/* McDonald and Moffit suggest a useful decomposition of the marginal effects associated with the censored regression model. They show that a change in the conditional mean due to right side variables derives from two sources:

 

1) It affects the conditional mean in the uncensored part of the distribution

 

2) It affects the conditional mean by also affecting the probability that an observation will lie in the uncensored part of the distribution.

 

Below we utilize the downloadable procedure "dtobit2" that takes into account this possible difference in marginal effects. Note that the marginal effect you should report depends on the use you intend to make of the study. If your primary interest is in the uncensored part of the distribution, then report the uncensored marginal effect. Otherwise, report the  entire marginal effect.*/

 

dtobit2 whrs kl6 k618 wa we, ll(0)       /*Marginal effects for latent variable, unconditional expected value, and marginal effect conditional on being uncensored.*/

 

/* Cragg has suggested that assuming a censoring limit that depends on the same distribution as the uncensored observations is often incorrect. He suggests a two equation system in which the first equation estimates the probability of being above the censoring limit and the second is a truncated regression on the uncensored observations. Below we estimate Cragg's model using Probit and STATA's Truncated regression procedure. Also, we do a likelihood ratio test of whether Cragg's model is significantly different than the Tobit model.

*/

 

probit lfp kl6 k618 wa we

scalar lprobit=e(ll)

display lprobit

keep if lfp==1

truncreg whrs kl6 k618 wa we, ll(0)

scalar ltrunc=e(ll)

display ltrunc

scalar lrtest=2*((lprobit+ltrunc)-ltobit)

display lrtest

 

/* The restricted model is Tobit. The unrestricted model is the two models estimated separately. The test statistic is 14.4, which is chi-squared with 5 degrees of freedom for the number of additional parameters being estimated. The critical value is 11.07, so we reject the null that the  restricted model is true. The two equation approach is therefore more appropriate than Tobit. */

 

/*STATA also allows estimating models where the data are censored within specific intervals (intreg). The censoring can also vary instead of being restricted to a single value.(cnreg). We will not illustrate those procedures here. */

 

/* Now let's turn to estimating a model with sample selection bias. In these cases the truncation is incidental, due to sample selection

on another variable that is correlated with the truncation in the dependent variable. As discussed in class, the standard model is Heckman's two stage procedure. Here is an example. First, read in the data again. */

 

use "c:\documents and settings\b. dan wood\my documents\my teaching\maximum likelihood\data\tobit.dta", clear

 

/* Now estimate the two stage sample selection model using Heckman's procedure.*/

 

heckman whrs kl6 k618 wa we, select(lfp=cit kl6) twostep

 

/* The probit equation estimates an index, the inverse Mills ratio that attempts to measure the omitted variable in the equation for the incidentially truncated variable in the second equation. This variable is inserted as an additional variable in the second equation. */