Introduction to Parametric Duration Models

 

The purpose of this session is to show you how to use LIMDEP's procedures for estimating parametric duration models. Note that we do not cover non-parametric or semi-parametric duration models which are an important part of this literature.

 

/* This file demonstrates the LIMDEP procedures for evaluating duration data. First, let's read in the data. These are data on the duration of labor strikes taken from Greene, p. 800. The variable T is the duration of the strike. The variable PROD is the residual from a linear regression of the log of industrial production on time, time squared, and a set of monthly dummy variables. It measures aggregate industrial production less trend and seasonal components. */

 

Reset $

 

read ; nrec=62 ; nvar=2 ; names=T,PROD $

7.00000 .113800E-01

9.00000 .113800E-01

13.0000 .113800E-01

14.0000 .113800E-01

26.0000 .113800E-01

29.0000 .113800E-01

52.0000 .113800E-01

130.000 .113800E-01

9.00000 .229900E-01

37.0000 .229900E-01

41.0000 .229900E-01

49.0000 .229900E-01

52.0000 .229900E-01

119.000 .229900E-01

3.00000 -.395700E-01

17.0000 -.395700E-01

19.0000 -.395700E-01

28.0000 -.395700E-01

72.0000 -.395700E-01

99.0000 -.395700E-01

104.000 -.395700E-01

114.000 -.395700E-01

152.000 -.395700E-01

153.000 -.395700E-01

216.000 -.395700E-01

15.0000 -.546700E-01

61.0000 -.546700E-01

98.0000 -.546700E-01

2.00000 .535000E-02

25.0000 .535000E-02

85.0000 .535000E-02

3.00000 .742700E-01

10.0000 .742700E-01

1.00000 .645000E-01

2.00000 .645000E-01

3.00000 .645000E-01

3.00000 .645000E-01

3.00000 .645000E-01

4.00000 .645000E-01

8.00000 .645000E-01

11.0000 .645000E-01

22.0000 .645000E-01

23.0000 .645000E-01

27.0000 .645000E-01

32.0000 .645000E-01

33.0000 .645000E-01

35.0000 .645000E-01

43.0000 .645000E-01

43.0000 .645000E-01

44.0000 .645000E-01

100.000 .645000E-01

5.00000 -.104430

49.0000 -.104430

2.00000 -.700000E-02

12.0000 -.700000E-02

12.0000 -.700000E-02

21.0000 -.700000E-02

21.0000 -.700000E-02

27.0000 -.700000E-02

38.0000 -.700000E-02

42.0000 -.700000E-02

117.000 -.700000E-02

 

/* LIMDEP expects the dependent variable to be logged for all but the Gompertz distribution and Cox model below. Also, you will want a censoring variable to indicate whether or not a process is incomplete. Below we create the censoring variable, STATUS, for all durations above 80. Note that the censoring variable can be omitted from the specifications below if there is no censoring. */

 

Create ; status = T < 81 ; logt=log(T)$

 

 

/* Now let's just estimate some duration models with no covariates. This would be useful for exploring the duration dependence of a process. First the Exponential model */

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Exponential $

 

/*Note the distinctive hazard and integrated hazard functions for the Exponential model. This reflects the memory-less property of this distribution. Also, note that this distribution results in hazard functions that are strictly monotonic. */

 

/*Now let's estimate the same relationship with the Weibull model. Weibull is probably the most popular. The reason is it's flexibility and the fact that it encompasses the exponential. You have a built-in test for exponential, in the significance and size of p. Also, the Weibull's hazard function is more flexible, which is an advantage if you think that duration dependence is not linear. */

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Weibull $

 

/*LIMDEP will also estimate the Gamma model. However, it is treated differently than the other models in that it is not easy to estimate the second parameter for the GAMMA, Theta simultaneously with the other parameter. Therefore, you supply a value for Theta. A search is still conducted over Theta, but the search ends up at the fixed value. The value of the asymptotic covariance matrix reflects an estimate value of Theta, rather than the fixed parameter. If you happen to know Theta, then it is possible to remove this uncertainty using the option Fix. You would only want to do this if you know theta. The GAMMA model allows non-monotonicity in the hazard function.*/

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Gamma; Theta=0.75 $

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Gamma; Theta=0.75 ; Fix $

 

/*Now let's do the same thing, but with a different value of Theta. Note the lack of significance now of the parameter P due to the change. */

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Gamma; Theta=1.25 $

 

/*Generally, note the distinctively different shape of the hazard functions for the Gamma model versus the Weibull or exponential. This demonstrates that it does make a difference which model is selected. */

 

/*LIMDEP will also estimate a normal, logistic, and loglogistic model. */

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Normal $

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Logistic $

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=LogLogistic $

 

/*LIMDEP allows you to include covariates in the duration model. First, let's estimate the Cox proportional hazards model. This model would be appropriate when you don't care about the form for the duration dependence and you can safely make the proportional hazards assumption. Note that we don't use the logged form when using the Cox model.*/

 

Survival ; Lhs=T,status ; Rhs=One,prod $

 

/*We can test the proportional hazards assumption by interacting the right side variables with time (or some function of time such as log(time)) and then testing the significance of the interaction coefficients. */

 

Create ; prodt=prod*T $

 

Survival ; Lhs=T,status ; Rhs=prod,prodT $

 

 

/*There is no evidence of violating the proportional hazard assumption. Still, we may be interested in the underlying duration dependence as a theoretical matter. So it may be preferable to use one of the parametric models.  Now let's estimate a parametric survival model. As observed in class, the parameters on covariates for these models must be interpreted with great caution. The interpetation is relatively straightforward with an exponential, or some values of the P parameter for other distribution. Otherwise, exercise caution. Below we include the production variable as a covariate. */

 

 

Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ; Model=Weibull $

 

/*The integrated hazard function can be used as a diagnostic tool for determining whether you have specified the correct model. Generally, a well specified model is one where the integrated hazard function begins at zero and is a straight line increasing in linear fashion. For example, here is a comparison of the Weibull and Log Logistic models.*/

 

Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ; Model=Weibull $

Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ; Model=LogLogistic $

 

/LIMDEP will also estimate a split population model, as suggested by Schmidt and Witte. All of the preceding models assume that the censored observations will eventually fail. However, the split population model allows for the possibility that some observations will never fail. Thus, we have three types of observations in the log-likelihood. Uncensored, censored that will eventually fail, and censored that will never fail. LIMDEP can estimate the third type of observation using either logit or probit. */

 

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Weibull ; Rh2=One ; Logit $

 

Survival ; Lhs=logT,status ; Rhs=One ; Plot ; Model=Weibull ; Rh2=One ; Probit $

 

 

/*All of the preceding models assume that the hazard rate is constant across individuals. If it is not, but you assume that it is, then parameter estimates can be inconsistent or standard errors can be inappropriate. Covariates can be included to account for heterogeneity. However, if heterogeneity is still suspected, then LIMDEP will estimate a model with heterogeneous hazard rates. This option is available for the Weibull and exponential models, but not the others. */

 

 

Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ; Model=Weibull ; Het $

 

/*The preceding posited that the hazard function differed across individuals. However, you can also specify that the scale factors (variances) differ across individuals. Here is the same model with this option. */

 

Survival ; Lhs=logT,status ; Rhs=One,prod ; Plot ; Model=Weibull ; Het ; Hfn=prod $

 

/*So how do you choose among all of these distributions and model specifications? You might use a combination of likelihood ratio tests among the competing models as we have in earlier exercises. You might use information criteria, such as AIC as in the STATA version of  this assignment. The Integrated Hazard Function may be helpful as discussed above. Additionally, you could estimate a more general distribution and then choose on the basis of the estimated parameters. The Generalized F distribution is more general and encompasses all of the above. The Generalized F has four parameters. The lognormal, loglogistic, Weibull, and Gompertz (to be discussed below) have two; the exponential has one; the Gamma and Weibull with heterogeneity have three. The Generalized F has parameters beta, sigma, and M1 and M2. Optionally, output may also include alpha and gamma.

 

Here are the output definitions.

 

beta-intercept and slope parameters in the index function

sigma-scale parameter for distribution of durations

M1,M2-degrees of freedom for the F distribution

alpha-slope parameter in the permanently censored splitting model

gamma-parameters in the variance heterogeneity model.

 

Here are some facts about the Generalized F that will help you choose the appropriate distribution:

 

loglogistic---> M1=M2=1

lognormal-----> M1=M2=infinite

Weibull-------> M1=1; M2=infinite

Exponential---> M1=1; M2=infinite ; sigma=1

Gamma---------> M2=infinite

 

Here is the generalized F estimation applied to the same data without censoring. */

 

Survival ; Lhs=logT ; Rhs=One,Prod ; Model=F ; Tlf=.0001 $

 

/*These results strongly suggest that the Weibull model is the most appropriate. M1 is close to 1 and M2 wanders off to a very large value. Also, the Weibull has the highest likelihood function of the other models. Were you to do one above, you would find that a likelihood ratio test would reject the loglogistic, but not the lognormal or exponential.*/

 

/*The preceding models are all log-linear. However, an alternative model, the Gompertz, is not log-linear. We now use t, rather than logT as the duration variable. Also, there is no intercept. This model has trouble with starting values so one approach is to feed starting values in after using one of the other distributions first. Also, it is verry slow, which is why I saved it until last.*/

 

Survival ; Lhs=logT,status ; Rhs=one,prod ; Model=Weibull ; Parameters $

Survival ; Lhs=T,status ; Rhs=one,prod ; Model=Gompertz

; start=b $

 

/*This lesson has provided a summary of parametric models of survival. As you can see, there is considerable expertise required to use and interpret these models. We've just touched the surface, and have not discussed the non-parametric or semi-parametric models including the Cox Proportional Hazard model which is widely used. */

 

Delete ; * $