The Linear Regression Model
The Purpose of this
week's computer assignment is to introduce you to the matrix treatment of the
linear regression model. In this assignment we will use LIMDEP's canned
regression procedures. Then we will replicate all of the results using matrix
algebra. Again, this is not because you would ever be using the matrix
approach, but to deepen understanding.
RESET $
/* Read the data.
The data are from Kmenta, 1971, p.
456 and relate to corporate investment behavior. The data are for General
Motors, and include 1) It=Investment at time t, 2) Ftlag1=outstanding Shares
lagged 1 period, and 3) Ktlag1=capital stock lagged 1 period. */
READ ; FORMAT=wks ;
NVAR=5 ; FILE=H:\teaching\POLS603\KMENT102.wks ; Names $
/* Set the sample
length and use dstats to take a look at the data. */
SAMPLE ; 1-19 $
dstats ; rhs=* ; output=3 ; quantiles ; plot ; all ; $
/* Now let's do a
regression the easy way. The following performs a regression of It on a
constant and FT and Kt lagged one period. We also plot the residuals against
the observation number. The res and keep command store the residuals and
predictions as a variable for later use. */
Regress ; lhs=It ;
rhs=one,FTLAG1,KTLAG1 ; list ; plot ; res=ehat ; keep=ypred $
/*
You can also plot the residuals against any other variable or against yhat
created above for diagnostic purposes. Also, you can plot the standardized
residuals as we shall see later, which is an aid in evaluating for outliers. */
PLOT
; LHS=YPRED ; RHS=EHAT $
PLOT ; LHS=FTLAG1 ; RHS=EHAT $
PLOT ; LHS=KTLAG1 ; RHS=EHAT $
/* Now, let's
illustrate how to do all of this the hard way using matrices. Define a matrix
of independent variables, X */
namelist ;
X=constant,FTLAG1,KTLAG1 ; y=It $
/* Create a
constant k for the number of regressors. Note that N is a system constant
already defined. However, we could also get this by taking i'i. */
CALC ; K=COL(X) $
/* Now let's create
some matrices and results that will be useful for later calculations. Create
M0, the matrix that converts to deviation form. */
MATRIX ; LIST ;
I=IDEN(N) $
MATRIX ; LIST ; ONES=INIT(N,1,1) $
MATRIX ; LIST ; ONEOVERN={1/N}*ONES*ONES' $
MATRIX ; LIST ; M0=I-ONEOVERN $
/*Create the
projection matrix P and the orthogonal projection matrix M */
MATRIX ; LIST ;
P=X*<X'X>*X' $
MATRIX ; LIST ; M=I-P $
/*Calculate the
mean of Y */
MATRIX ; list ;
Ymean=1/N*y'1 $
/*Now, let's
calculate the set of regression coefficients. */
MATRIX ; list ;
Beta=<X'X>*X'Y $
/*Now, let's get the
vector of predicted values using two different methods. */
MATRIX ; list ;
YHAT=X*Beta $
MATRIX ; LIST ; YHAT=P*Y $
/*Now, let's get
the vector of residuals using two different methods. */
MATRIX ; list ;
E=Y-YHAT $
MATRIX ; LIST ; E=M*Y $
/*Now, let's
calculate the total sum of squares for Y using two different methods.*/
MATRIX ; list ;
SST=Y'Y-YMEAN^2*N $
MATRIX ; LIST ; SST=Y'M0*Y $
/*Now, let's
calculate the sum of the squared errors using two different methods.*/
MATRIX ; list ;
SSE=Y'Y-Beta'*X'Y $
MATRIX ; LIST ; SSE=E'E $
/*Now, let's
calculate the sum of squares due to regression using two different methods. */
MATRIX ; list ;
SSR=Beta'*X'Y-YMEAN^2*N $
MATRIX ; LIST ; SSR=Beta'*X'M0*X*Beta $
/*Now, let's get
the three mean squares implied by an analysis of variance table. */
CALC ; list ;
MST=SST/(N-1) $
CALC ; list ; MSE=SSE/(N-K) $
CALC ; list ; MSR=SSR/(K-1)$
/*Now, let's
calculate the variance of the regression */
CALC ; list ;
SIGMA2=SSE/(N-K) $
/*Now, let's
calculate the Standard Error of Estimates. */
CALC ; list ;
SEE=SQR(SIGMA2) $
/*Now, let's get
the variance covariance matrix of coefficients using the variance of the
regression calculated above. */
MATRIX ; list ;
VARCOVB=SIGMA2*<X'X> $
/*Now, let's pick
off the variances of the individual coefficients from the variance covariance
matrix above. */
MATRIX ; list ;
VARBeta=VECD(VARCOVB)$
/*The standard
errors of the coefficients are just the square root of the diagonal of the
variance covariance matrix of coefficients. */
MATRIX ; list ; STDERRS=ESQR(VARBeta)
$
/*Using the
coefficients we can compute t statistics for the null hypothesis that the
coefficient equals zero just by dividing the coefficients by the standard
errors. */
MATRIX ; LIST ;
ONEOVSB=./STDERRS $
MATRIX ; list ; TSTAT=DIRP(BETA,ONEOVSB) $
/*Now, let's create
a correlation matrix for the variance covariance matrix of coefficients. */
MATRIX ; LIST ;
DIAGSB=DIAG(ONEOVSB) $
MATRIX ; list ; CORRMAT=DIAGSB*VARCOVB*DIAGSB' $
/*Now let's
calculate R squared using a couple of different ways. */
CALC ; list ;
RSQR=1-SSE/SST $
CALC ; LIST ; RSQR=SSR/SST $
/*Now, let's
calculate adjusted R squared a couple of different ways. */
CALC ; list ;
RSQRADJ=1-MSE/MST $
CALC ; LIST ; RSQRADJ=1-((N-1)/(N-K))*(1-RSQR) $
/* Now, let's calculate
an F statistic for the overall significance of the regression a couple of
different ways. */
CALC ; LIST ;
FSTAT=MSR/MSE $
CALC ; LIST ; FSTAT=(RSQR/(K-1))/((1-RSQR)/(N-K)) $
/* Now, let's
calculate the standardized residuals discussed on pp. 60-61 of Greene using
both the regression procedure and a matrix approach. A standardized (sometimes
called studentized) residual larger than +/-2 is suggestive of an outlier
according to Belsey, Kuh, and Welch. The standardized residual is included for
the list or plot option when the word standard is also used as an option. */
Regress ; lhs=It ;
rhs=one,FTLAG1,KTLAG1 ; list; standard $
MATRIX ; HAT=P $
MATRIX ; Hii=VECD(HAT) $
MATRIX ; ONEMNHii=1-Hii $
MATRIX ; SQRTHii=ESQR(ONEMNHii) $
MATRIX ; DENOM=SEE.*SQRTHii $
MATRIX ; LIST ; UI=DIRP(E,./DENOM) $
/* Belsey, Kuh, and
Welch also suggest looking at the leverage exerted on the regression of each
observation. Leverage is just the values of the diagonal of the Hat matrix. A
value of leverage larger than 2K/N suggests an outlier. The critical value
would then be: */
CALC ; list ;
HatCrit=2*K/N $
MATRIX ; XXI=<X'X> $
CREATE ; Hatvalue=QFR(X,XXI) $
CREATE ; IF(Hatvalue > HatCrit)Lookatme=1 $
LIST ; Hatvalue,Lookatme $
/* That's all
folks. Delete all variables to be ready for a new analysis. */
Delete ; * $