Introduction to
Parametric Duration Models
The
purpose of this session is to show you how to use LIMDEP's procedures for
estimating parametric duration models. Note that we do not cover non-parametric
or semi-parametric duration models which are an important part of this
literature.
/*
This file demonstrates the LIMDEP procedures for
evaluating duration data. First, let's read in the data. These are data on the
duration of labor strikes taken from Greene, p. 800. The variable T is the
duration of the strike. The variable PROD is the residual from a linear
regression of the log of industrial production on time, time squared, and a set
of monthly dummy variables. It measures aggregate industrial production less
trend and seasonal components. */
Reset
$
read ; nrec=62 ; nvar=2 ; names=T,PROD $
7.00000 .113800E-01
9.00000 .113800E-01
13.0000 .113800E-01
14.0000 .113800E-01
26.0000 .113800E-01
29.0000 .113800E-01
52.0000 .113800E-01
130.000 .113800E-01
9.00000 .229900E-01
37.0000 .229900E-01
41.0000 .229900E-01
49.0000 .229900E-01
52.0000 .229900E-01
119.000 .229900E-01
3.00000
-.395700E-01
17.0000
-.395700E-01
19.0000
-.395700E-01
28.0000
-.395700E-01
72.0000
-.395700E-01
99.0000
-.395700E-01
104.000
-.395700E-01
114.000
-.395700E-01
152.000
-.395700E-01
153.000
-.395700E-01
216.000
-.395700E-01
15.0000
-.546700E-01
61.0000
-.546700E-01
98.0000
-.546700E-01
2.00000 .535000E-02
25.0000 .535000E-02
85.0000 .535000E-02
3.00000 .742700E-01
10.0000 .742700E-01
1.00000 .645000E-01
2.00000 .645000E-01
3.00000 .645000E-01
3.00000 .645000E-01
3.00000 .645000E-01
4.00000 .645000E-01
8.00000 .645000E-01
11.0000 .645000E-01
22.0000 .645000E-01
23.0000 .645000E-01
27.0000 .645000E-01
32.0000 .645000E-01
33.0000 .645000E-01
35.0000 .645000E-01
43.0000 .645000E-01
43.0000 .645000E-01
44.0000 .645000E-01
100.000 .645000E-01
5.00000
-.104430
49.0000
-.104430
2.00000
-.700000E-02
12.0000
-.700000E-02
12.0000
-.700000E-02
21.0000
-.700000E-02
21.0000
-.700000E-02
27.0000
-.700000E-02
38.0000
-.700000E-02
42.0000
-.700000E-02
117.000
-.700000E-02
/*
LIMDEP expects the dependent variable to be logged for all but the Gompertz
distribution and Cox model below. Also, you will want a censoring variable to indicate
whether or not a process is incomplete. Below we create the censoring variable,
STATUS, for all durations above 80. Note that the censoring variable can be
omitted from the specifications below if there is no censoring. */
Create ; status = T < 81 ; logt=log(T)$
/*
Now let's just estimate some duration models with no covariates. This would be
useful for exploring the duration dependence of a process. First the
Exponential model */
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Exponential $
/*Note
the distinctive hazard and integrated hazard functions for the Exponential
model. This reflects the memory-less property of this distribution. Also, note
that this distribution results in hazard functions that are strictly monotonic.
*/
/*Now
let's estimate the same relationship with the Weibull model. Weibull is
probably the most popular. The reason is it's
flexibility and the fact that it encompasses the exponential. You have a
built-in test for exponential, in the significance and size of p. Also, the
Weibull's hazard function is more flexible, which is an advantage if you think
that duration dependence is not linear. */
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Weibull $
/*LIMDEP
will also estimate the Gamma model. However, it is treated differently than the
other models in that it is not easy to estimate the second parameter for the
GAMMA, Theta simultaneously with the other parameter. Therefore, you supply a
value for Theta. A search is still conducted over Theta, but the search ends up
at the fixed value. The value of the asymptotic covariance matrix reflects an
estimate value of Theta, rather than the fixed parameter. If you happen to know
Theta, then it is possible to remove this uncertainty using the option Fix. You
would only want to do this if you know theta. The GAMMA model allows
non-monotonicity in the hazard function.*/
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Gamma; Theta=0.75 $
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Gamma; Theta=0.75 ; Fix $
/*Now
let's do the same thing, but with a different value of Theta. Note the lack of
significance now of the parameter P due to the change. */
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Gamma; Theta=1.25 $
/*Generally, note the distinctively different shape of the
hazard functions for the Gamma model versus the Weibull or exponential. This
demonstrates that it does make a difference which model is selected. */
/*LIMDEP
will also estimate a normal, logistic, and loglogistic model. */
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Logistic $
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=LogLogistic $
/*LIMDEP
allows you to include covariates in the duration model. First, let's estimate
the Cox proportional hazards model. This model would be appropriate when you
don't care about the form for the duration dependence and you can safely make
the proportional hazards assumption. Note that we don't use the logged form
when using the Cox model.*/
Survival ; Lhs=T,status ; Rhs=One,prod $
/*We
can test the proportional hazards assumption by interacting the right side
variables with time (or some function of time such as log(time))
and then testing the significance of the interaction coefficients. */
Create ; prodt=prod*T $
Survival ; Lhs=T,status ; Rhs=prod,prodT $
/*There
is no evidence of violating the proportional hazard assumption. Still, we may be
interested in the underlying duration dependence as a theoretical matter. So it
may be preferable to use one of the parametric models. Now let's estimate a parametric survival
model. As observed in class, the parameters on covariates for these models must
be interpreted with great caution. The interpetation is relatively
straightforward with an exponential, or some values of the P parameter for
other distribution. Otherwise, exercise caution. Below we include the
production variable as a covariate. */
Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ;
Model=Weibull $
/*The integrated hazard function can be used as a diagnostic
tool for determining whether you have specified the correct model. Generally, a
well specified model is one where the integrated hazard function begins at zero
and is a straight line increasing in linear fashion. For example, here is a
comparison of the Weibull and Log Logistic models.*/
Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ;
Model=Weibull $
Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ;
Model=LogLogistic $
/LIMDEP
will also estimate a split population model, as suggested by Schmidt and Witte.
All of the preceding models assume that the censored observations will
eventually fail. However, the split population model allows for the possibility
that some observations will never fail. Thus, we have three types of
observations in the log-likelihood. Uncensored, censored that will eventually
fail, and censored that will never fail. LIMDEP can estimate the third type of observation
using either logit or probit. */
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Weibull ; Rh2=One ; Logit $
Survival ; Lhs=logT,status ; Rhs=One ; Plot ;
Model=Weibull ; Rh2=One ; Probit $
/*All
of the preceding models assume that the hazard rate is constant across
individuals. If it is not, but you assume that it is, then parameter estimates
can be inconsistent or standard errors can be inappropriate. Covariates can be
included to account for heterogeneity. However, if heterogeneity is still
suspected, then LIMDEP will estimate a model with heterogeneous hazard rates.
This option is available for the Weibull and exponential models, but not the
others. */
Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ;
Model=Weibull ; Het $
/*The preceding posited that the hazard function differed
across individuals. However, you can also specify that the scale factors
(variances) differ across individuals. Here is the same model with this option.
*/
Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ;
Model=Weibull ; Het ; Hfn=prod $
/*So how do you choose among all of these distributions and
model specifications? You might use a combination of likelihood ratio tests
among the competing models as we have in earlier exercises. You might use
information criteria, such as AIC as in the STATA version of this assignment. The Integrated Hazard
Function may be helpful as discussed above. Additionally, you could estimate a
more general distribution and then choose on the basis of the estimated
parameters. The Generalized F distribution is more general and encompasses all
of the above. The Generalized F has four parameters. The lognormal,
loglogistic, Weibull, and Gompertz (to be discussed below) have two; the
exponential has one; the Gamma and Weibull with heterogeneity have three. The
Generalized F has parameters beta, sigma, and M1 and M2. Optionally, output may
also include alpha and gamma.
Here
are the output definitions.
beta-intercept and slope parameters in
the index function
sigma-scale parameter for distribution of durations
M1,M2-degrees of freedom for the F distribution
alpha-slope parameter in the permanently censored
splitting model
gamma-parameters in the variance
heterogeneity model.
Here
are some facts about the Generalized F that will help you choose the
appropriate distribution:
loglogistic---> M1=M2=1
lognormal-----> M1=M2=infinite
Weibull------->
M1=1; M2=infinite
Exponential--->
M1=1; M2=infinite ; sigma=1
Gamma--------->
M2=infinite
Here
is the generalized F estimation applied to the same data without censoring. */
Survival ; Lhs=logT ; Rhs=One,Prod ; Model=F ;
Tlf=.0001 $
/*These results strongly suggest that the Weibull model is the
most appropriate. M1 is close to 1 and M2 wanders off to a very large value.
Also, the Weibull has the highest likelihood function of the other models. Were
you to do one above, you would find that a likelihood ratio test would reject
the loglogistic, but not the lognormal or exponential.*/
/*The preceding models are all log-linear. However, an
alternative model, the Gompertz, is not log-linear. We now use t, rather than
logT as the duration variable. Also, there is no intercept. This model has
trouble with starting values so one approach is to feed starting values in
after using one of the other distributions first. Also, it is verry slow, which
is why I saved it until last.*/
Survival ; Lhs=logT,status ; Rhs=one,prod ;
Model=Weibull ; Parameters $
Survival ; Lhs=T,status ; Rhs=one,prod ;
Model=Gompertz
;
start=b $
/*This lesson has provided a summary of parametric models of
survival. As you can see, there is considerable expertise required to use and
interpret these models. We've just touched the surface, and have not discussed
the non-parametric or semi-parametric models including the Cox Proportional
Hazard model which is widely used. */
Delete ; * $