Multicollinearity, Specification, and Other
Topics
This session is a
"smattering of topics. The purpose is to 1) introduce methods of testing for
multicollinearity using LIMDEP, 2) introduce a test
for omitted variables called the RESET test, and 3) introduce the Hausman test for endogeneity,
measurement error, or various other forms of mis-specification.
/*Let's consider multicollinearity
first. Here are the Longley data which are famous for exhibiting multicollinearity */
Read ; Nobs = 16 ; Nvar = 5 ; names =
Employ, Price, GNP, Armed, Year $
60323 83.0 234289 1590 1947
61122 88.5 259426 1456 1948
60171 88.2 258054 1616 1949
61187 89.5 284599 1650 1950
63221 96.2 328975 3099 1951
63639 98.1 346999 3594 1952
64989 99.0 365385 3547 1953
63761 100.0 363112 3350 1954
66019 101.2 397469 3048 1955
67857 104.6 419180 2857 1956
68169 108.4 442769 2798 1957
66513 110.8 444546 2637 1958
68655 112.6 482704 2552 1959
69564 114.2 502601 2514 1960
69331 115.7 518173 2572 1961
70551 116.9 554894 2827 1962
Namelist ; X = One,Year,Price,GNP,Armed
$
/*Note the effect
of omitting a single observation on the coefficient estimates */
Sample
; 1 - 16
$
Regress ; Lhs = Employ ; Rhs = X $
Sample ; 1 - 15 $
Regress ; Lhs = Employ ; Rhs = X $
/*Note also the
effect of omitting one of the independent variables on the coefficient
estimates */
Sample
; 1 - 16
$
Regress ; Lhs = Employ ; Rhs = X $
Regress ; Lhs = Employ ; Rhs = One,Year,Price,Armed
$
/*Now, let's do
some formal tests for multicollinearity in X by
performing some auxiliary regressions. First, save R squared from our primary
regression for comparison purposes. */
Regress
; Lhs =
Employ ; Rhs = X $
CALC ; LIST ; R2MODEL=RSQRD $
/*Now, let's
calculate RsquaredJ, the Tolerance, and the Variance
Inflation Factor for all variable combinations. Note that RsquaredJ
greater than about 0.8 or 0.9, a tolerance less than .1 or .2, or a variance
inflation factor greater than about 5 or 10 suggest a problem. See the lecture
notes for more specific guidelines.*/
REGRESS ; LHS=YEAR
; RHS=one,PRICE,GNP,ARMED $
CALC ; LIST ; R2YEAR=RSQRD ; R2MODEL ; TOLYEAR=1-R2YEAR ; VIFYEAR=1/TOLYEAR $
REGRESS ; LHS=PRICE ; RHS=one,YEAR,GNP,ARMED $
CALC ; LIST ; R2PRICE=RSQRD ; R2MODEL ; TOLPRICE=1-R2PRICE ;
VIFPRICE=1/TOLPRICE $
REGRESS ; LHS=GNP ; RHS=one,YEAR,PRICE,ARMED $
CALC ; LIST ; R2GNP=RSQRD ; R2MODEL ; TOLGNP=1-R2GNP ; VIFGNP=1/TOLGNP $
REGRESS ; LHS=ARMED
; RHS=one,YEAR,PRICE,GNP $
CALC ; LIST ; R2ARMED=RSQRD ; R2MODEL ; TOLARMED=1-R2ARMED ;
VIFARMED=1/TOLARMED $
/*Now, let's
calculate the condition number and condition index for the X matrix. A
condition number larger than 1000 or a condition index larger than about 30
suggest serious problems.*/
matrix ; XtX=X'X $
matrix ; D=diag(XtX) $
matrix ; D=sqrt(D) $
matrix ; V=<D>*X'*X*<D> $
MATRIX ; LIST ; eigvals=cxrt(V)
; MAXEIG=PART(EIGVALS,1,1,1,1)
; MINEIG=PART(EIGVALS,5,5,1,1) $
MATRIX ; LIST ; CONDNUMR=MAXEIG*DIRI(MINEIG) $
MATRIX ; LIST ; CONDINDX=SQRT(CONDNUMR) $
/* Now let's look
at some model specification tests. */
/* FIRST LOAD IN
SOME NEW DATA THAT DOESN'T HAVE REAMS OF MULTICOLLINEARITY. WE'LL USE THE SAME
DATA WE USED IN EARLIER.*/
DELETE ; * $
RESET $
READ ; NVAR=6 ; NOBS=15 ; NAMES=1 $
INVEST,CONSTANT,TREND,GNP,INTEREST,INFLATE
0.161 1 1 1.058 5.16 4.4
0.172 1 2 1.088 5.87 5.15
0.158 1 3 1.086 5.95 5.37
0.173 1 4 1.122 4.88 4.99
0.195 1 5 1.186 4.5 4.16
0.217 1 6 1.254 6.44 5.75
0.199 1 7 1.246 7.83 8.82
0.163 1 8 1.232 6.25 9.31
0.195 1 9 1.298 5.5 5.21
0.231 1 10 1.37 5.46 5.83
0.257 1 11 1.439 7.46 7.4
0.259 1 12 1.479 10.28 8.64
0.225 1 13 1.474 11.77 9.31
0.241 1 14 1.503 13.42 9.44
0.204 1 15 1.475 11.02 5.99
/* NOW LET'S DO RAMSEY'S RESET TEST. NOTICE THAT WE ARE LEAVING GNP OUT OF THE
EQUATION IN CHAPTER 6 IN ORDER TO GENERATE A SPECIFICATION ERROR, AND SHOW THAT
THE TEST WORKS. THE NULL FOR THE TEST IS NO SPECIFICATION ERROR. A SIGNIFICANT
COEFFICIENT ON THE SQUARED, CUBED, AND FOURTH POWERED YHATS IN ANY OF THE
AUXILIARY REGRESSIONS INDICATES A SPECIFICATION ERROR.*/
NAMELIST ;
X=CONSTANT,TREND,INTEREST,INFLATE $
REGRESS ; LHS=INVEST ; RHS=X ; KEEP=YHAT $
CREATE ; Y2=YHAT^2 ; Y3=YHAT^3 ; Y4=YHAT^4 $
REGRESS ; LHS=INVEST ; RHS=X,Y2 ; RLS:B(5)=0 $
REGRESS ; LHS=INVEST ; RHS=X,Y2,Y3 ; RLS:B(5)=0,B(6)=0 $
REGRESS ; LHS=INVEST ; RHS=X,Y2,Y3,Y4 ; RLS:B(5)=0,B(6)=0,B(7)=0 $
/* Now Let's do Hausman's Test for endogeneity using the Consumption Function. Note that this
test is very general and may also pertain to measurement error in an
independent variable, as well as any other factor that would require modeling
with an instrumental variables approach.*/
DELETE ; * $
RESET $
Read ; Nobs=36 ; Nvar = 3 ;
Names = 1 $
Year Y C
1950 791.8 733.2
1951 819.0 748.7
1952 844.3 771.4
1953 880.0 802.5
1954 894.0 822.7
1955 944.5 873.8
1956 989.4 899.8
1957 1012.1 919.7
1958 1028.8 932.9
1959 1067.2 979.4
1960 1091.1 1005.1
1961 1123.2 1025.2
1962 1170.2 1069.0
1963 1207.3 1108.4
1964 1291.0 1170.6
1965 1365.7 1236.4
1966 1431.3 1298.9
1967 1493.2 1337.7
1968 1551.3 1405.9
1969 1599.8 1456.7
1970 1688.1 1492.0
1971 1728.4 1538.8
1972 1797.4 1621.9
1973 1916.3 1689.6
1974 1896.6 1674.0
1975 1931.7 1711.9
1976 2001.0 1803.9
1977 2066.6 1883.8
1978 2167.4 1961.0
1979 2216.2 2004.4
1980 2214.3 2000.4
1981 2248.6 2024.2
1982 2261.5 2050.7
1983 2334.6 2145.9
1984 2468.4 2239.9
1985 2509.0 2312.6
?
? Create lagged values, then set sample for complete data
?
Create ; If(_Obsno > 1) |
y1 = y[-1] ; c1 = c[-1] $
Sample ; 2 - 36 $
?
? Define data matrices
?
Namelist ;
X = One,y ; Z = One,y1,c1 $
?
? X-hat - by regressions on Z
?
Matrix ; Xh =
Z*<Z'Z>*Z'X $
?
? Variance estimator. Only
consistent under null hypothesis
?
Calc ; s2 = Ess(X,c)/(n-col(X)) $
?
? Variance matrix. V has rank 1, so invert by
Moore-Penrose
?
Matrix ; V = s2*<XH'XH> - s2*<X'X>
; d = <Xh'Xh>*Xh'c -
<X'X>*X'c
; List ; H = d' * Mpnv(V) * d $
?
? Hausman Statistic
? Wu statistic, based on regressions
?
Regress; Lhs = y ; Rhs = z ;
Keep = ys $
Regress;Lhs = c ; Rhs = Ys,X ; Cls:b(1)=0 $
/*Note that a
significant Hausman or Wu statistic indicate that you
can reject the null hypothesis that least squares is a consistent estimator*/
/* That's all for this lesson. */