FUNDAMENTALS OF R
The purpose of this session
is to introduce you to R and gain some familiarity with a few R commands that
will be used in future sessions.
SOME BASICS
R is a high level programming
language based on objects. In using R you must define each thing to be used as
an object. For example, to get data into R you need to define a dataset as an
object and then attach it. For example, to define an object “Example” from an
external dataset EXAMPLE.txt type the following.
Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)
attach(Example)
This two command sequence
assumes you are reading a text file containing variable names in the first
line. R will read text files, files created in R, as well as data files written
in other formats such as STATA and SPSS using the “foreign” library. Note that
R is case sensitive. All R commands assume lower case.
Once you have your data into
R you may do stuff with it such as the following. To view an object, type its
name at the command prompt. For example, type the following.
Example
To obtain summary statistics
on the object “Example” type:
summary(Example)
The object “Example” that we
have defined contains variables Y, X1, X2, and X3. To get the mean, variance,
and standard deviation of X1 type the following.
mean(X1)
var(X1)
sd(X1)
It is also easy to create new
variables (also Objects) in R. Assuming
you have attached the dataset “Example”, type in the following short list of
commands.
Example$NEWVAR <- Y+X1+X2+X3
detach(Example)
attach(Example)
This
creates a new variable called NEWVAR which is the sum of the other variables.
The subsequent detach and attach lines assure that the new variable is on the
dataset Example.
To
obtain a correlation matrix on the variables in “Example”, type the following.
cor(Example[,c("X1","X2","X3","Y")],
use="complete.obs")
To
obtain a scatterplot between two variables with a
regression line, labels, boxplots, and a parametric
regression line type the following.
scatterplot(Y~X1, reg.line=lm,
boxplots='xy', smooth=TRUE,
span =0.5, data=Example)
You can enter R commands from
the R COMMAND window directly simply by typing the command and pressing the
enter key. Or you can enter commands
into a batch file created in any text processor and then paste the lines into
the R command window.
I like WinEdit
as a text processor with R because it has an interface specifically designed
for R. Rcmdr is another commonly used text processor
that has a point and click interface for many R commands. You can also use the
windows text editor, Microsoft Word, or Wordperfect
in text mode.
All of the preceding can be
executed in a batch file format by typing the lines into a plain text file and
then copying and pasting the selected text into the R command window. Make sure
that the path is correct for finding the file called “EXAMPLE.txt” on the first
line.
If you wish to run a subset
of the file, highlight the lines you wish to run and copy and paste those lines
into the R command window.
Note that the typical R
command contains the arrow, which defines an object.
To print results from your
output, select FILE PRINT from the main R menu.
Make sure that your cursor is in the window you want to print. To save
output from an R session select FILE, SAVE TO FILE.
R can save the objects you
create in a session. This can be good or bad. If you do not wish to save objects from a
“messy” session, then you might want to start each file with the following
command to remove all objects.
rm(list=ls(all=TRUE))
Note that R has an extensive
library of procedures that can be installed. To see many of the available
packages for R go to the following web link.
http://www.maths.lth.se/help/R/.R/doc/html/packages.html
Click on the package to get a
description.
To install the package in R
go to the PACKAGES menu within R, click ‘install package’, and select the
package to install.
In order to use the installed
package it must be loaded. Load a package by typing the following.
library(packagename)
After installation, to see
what is in an installed and loaded package, type the following.
help(package=packagename)
To get help with a command
within a package that is already installed and loaded, type the following.
help(commandname)
or
example(commandname)
All of the assignments on the
syllabus require the ‘car’ package to be installed and loaded. You can do this
by putting the following command at the top of your file.
library(car)
Alternatively, you might want
to put this command in the ‘Rprofile’ file so that it
is loaded automatically each time you start R.
ASSIGNMENT: As an exercise
for this first session, read the data from the data file called “example.txt.”
Do all of the preceding commands. Then, create some new variables, x1*x2 and x3
squared. Compute descriptive statistics on these new variables. Use the “lm”
command to do a regression of y on a constant, x1, x2, x1*x2, and x3 squared.
Following your regression, save the residuals and predicted values to new
variables on the dataset, list and plot the residuals, and obtain CUSUM plots
for parameter stability on the model. Finally, use the “anova”
command to view the analysis of variance table for the variables in the model.
Try this first independently.
However, if you get into trouble, here is what the command file would look
like.
# This file is intended to get you started with R.
# First read in the data and examine it
.
Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)
attach(Example)
Example
# Now summarize the data for all variables.
summary(Example)
# Now compute the mean, variance, standard deviation, etc. for a single variable.
mean(X1)
var(X1)
sd(X1)
median(X1)
max(X1)
min(X1)
# Now compute the skewness and kurtosis using the e1071 library
library(e1071)
skewness(X1)
kurtosis(X1)
# Now create a new variable and add it to the active data file.
Example$NEWVAR <- Y+X1+X2+X3
detach(Example)
attach(Example)
# Now do a correlation matrix among the original variables.
cor(Example[,c("X1","X2","X3","Y")], use="complete.obs")
# Now do a scatterplot between two variables with a superimposed regression line,
# a parametric regression line, and box plots for each variable. This is in the car library.
library(car)
scatterplot(Y~X1, reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)
# Now create two variables X1*X2 and X3 squared and add to the active data.
Example$X1X2 <- X1 * X2
Example$X3sqr <- X3^2
detach(Example)
attach(Example)
# Get descriptive statistics on the new variables
summary(X1X2)
var(X1X2)
sd(X1X2)
summary(X3sqr)
var(X3sqr)
sd(X3sqr)
# Look at the entire dataset.
Example
# Now regress Y on X1, X2, X1X2, and X3sqr and look at the output object.
regress.model <- lm(Y ~ X1 + X2 + X1X2 + X3sqr)
summary(regress.model)
# Now add the residuals and predicted values to the dataset.
Example$residuals <- residuals(regress.model)
Example$fitted <- fitted(regress.model)
detach(Example)
attach(Example)
# Now list and plot the residuals and predicted values
residuals
plot(residuals)
fitted
plot(fitted)
# Now get the analysis of variance table for the regression.
anova(regress.model)
# Now construct a CUSUM plot for model stability.
library(strucchange)
plot(efp(Y ~ X1 + X2 + X1X2 + X3sqr, type = "Rec-CUSUM"))