FUNDAMENTALS OF R

 

The purpose of this session is to introduce you to R and gain some familiarity with a few R commands that will be used in future sessions.

 

SOME BASICS

 

R is a high level programming language based on objects. In using R you must define each thing to be used as an object. For example, to get data into R you need to define a dataset as an object and then attach it. For example, to define an object “Example” from an external dataset EXAMPLE.txt type the following.

 

Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)

attach(Example)

 

This two command sequence assumes you are reading a text file containing variable names in the first line. R will read text files, files created in R, as well as data files written in other formats such as STATA and SPSS using the “foreign” library. Note that R is case sensitive. All R commands assume lower case.

 

Once you have your data into R you may do stuff with it such as the following. To view an object, type its name at the command prompt. For example, type the following.

 

Example

 

To obtain summary statistics on the object “Example” type:

 

summary(Example)

 

The object “Example” that we have defined contains variables Y, X1, X2, and X3. To get the mean, variance, and standard deviation of X1 type the following.

 

mean(X1)

var(X1)

sd(X1)

 

It is also easy to create new variables (also Objects) in R.  Assuming you have attached the dataset “Example”, type in the following short list of commands.

 

Example$NEWVAR <- Y+X1+X2+X3

detach(Example)

attach(Example)

 

This creates a new variable called NEWVAR which is the sum of the other variables. The subsequent detach and attach lines assure that the new variable is on the dataset Example.

 

To obtain a correlation matrix on the variables in “Example”, type the following.

 

cor(Example[,c("X1","X2","X3","Y")], use="complete.obs")

 

To obtain a scatterplot between two variables with a regression line, labels, boxplots, and a parametric regression line type the following.

 

scatterplot(Y~X1, reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)

 

You can enter R commands from the R COMMAND window directly simply by typing the command and pressing the enter key.  Or you can enter commands into a batch file created in any text processor and then paste the lines into the R command window.

 

I like WinEdit as a text processor with R because it has an interface specifically designed for R. Rcmdr is another commonly used text processor that has a point and click interface for many R commands. You can also use the windows text editor, Microsoft Word, or Wordperfect in text mode.

 

All of the preceding can be executed in a batch file format by typing the lines into a plain text file and then copying and pasting the selected text into the R command window. Make sure that the path is correct for finding the file called “EXAMPLE.txt” on the first line. 

 

If you wish to run a subset of the file, highlight the lines you wish to run and copy and paste those lines into the R command window.

 

Note that the typical R command contains the arrow, which defines an object.

 

To print results from your output, select FILE PRINT from the main R menu.  Make sure that your cursor is in the window you want to print. To save output from an R session select FILE, SAVE TO FILE.

 

R can save the objects you create in a session. This can be good or bad. If  you do not wish to save objects from a “messy” session, then you might want to start each file with the following command to remove all objects.

 

rm(list=ls(all=TRUE))

 

Note that R has an extensive library of procedures that can be installed. To see many of the available packages for R go to the following web link.

 

http://www.maths.lth.se/help/R/.R/doc/html/packages.html

 

Click on the package to get a description.

 

To install the package in R go to the PACKAGES menu within R, click ‘install package’, and select the package to install.

 

In order to use the installed package it must be loaded. Load a package by typing the following.

 

library(packagename)

 

After installation, to see what is in an installed and loaded package, type the following.

 

help(package=packagename)

 

To get help with a command within a package that is already installed and loaded, type the following.

 

help(commandname)

 

or

 

example(commandname)

 

All of the assignments on the syllabus require the ‘car’ package to be installed and loaded. You can do this by putting the following command at the top of your file.

 

library(car)

 

Alternatively, you might want to put this command in the ‘Rprofile’ file so that it is loaded automatically each time you start R.

 

 

ASSIGNMENT: As an exercise for this first session, read the data from the data file called “example.txt.” Do all of the preceding commands. Then, create some new variables, x1*x2 and x3 squared. Compute descriptive statistics on these new variables. Use the “lm” command to do a regression of y on a constant, x1, x2, x1*x2, and x3 squared. Following your regression, save the residuals and predicted values to new variables on the dataset, list and plot the residuals, and obtain CUSUM plots for parameter stability on the model. Finally, use the “anova” command to view the analysis of variance table for the variables in the model.

 

Try this first independently. However, if you get into trouble, here is what the command file would look like.

 

# This file is intended to get you started with R.

# First read in the data and examine it

.

Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)

attach(Example)

Example

 

# Now summarize the data for all variables.

 

summary(Example)

 

# Now compute the mean, variance, standard deviation, etc. for a single variable.

 

mean(X1)

var(X1)

sd(X1)

median(X1)

max(X1)

min(X1)

 

# Now compute the skewness and kurtosis using the e1071 library

 

library(e1071)

skewness(X1)

kurtosis(X1)

 

# Now create a new variable and add it to the active data file.

 

Example$NEWVAR <- Y+X1+X2+X3

detach(Example)

attach(Example)

 

# Now do a correlation matrix among the original variables.

 

cor(Example[,c("X1","X2","X3","Y")], use="complete.obs")

 

# Now do a scatterplot between two variables with a superimposed regression line,

# a parametric regression line, and box plots for each variable. This is in the car library.

 

library(car)

scatterplot(Y~X1, reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)

 

# Now create two variables X1*X2 and X3 squared and add to the active data.

 

Example$X1X2 <- X1 * X2

Example$X3sqr <- X3^2

detach(Example)

attach(Example)

 

# Get descriptive statistics on the new variables

 

summary(X1X2)

var(X1X2)

sd(X1X2)

summary(X3sqr)

var(X3sqr)

sd(X3sqr)

 

# Look at the entire dataset.

 

Example

 

# Now regress Y on X1, X2, X1X2, and X3sqr and look at the output object.

 

regress.model <- lm(Y ~ X1 + X2 + X1X2 + X3sqr)

summary(regress.model)

 

# Now add the residuals and predicted values to the dataset.

 

Example$residuals <- residuals(regress.model)

Example$fitted <- fitted(regress.model)

detach(Example)

attach(Example)

 

# Now list and plot the residuals and predicted values

 

residuals

plot(residuals)

fitted

plot(fitted)

 

# Now get the analysis of variance table for the regression.

 

anova(regress.model)

 

# Now construct a CUSUM plot for model stability.

 

library(strucchange)

plot(efp(Y ~ X1 + X2 + X1X2 + X3sqr, type = "Rec-CUSUM"))