First-r-lab

From Earlham CS Department
Revision as of 07:53, 16 October 2012 by Charliepeck (talk | contribs)
Jump to navigation Jump to search

From An Introduction to R Graphics, Paul Murrell (http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html)

The most useful aspect of R graphics is that one can add "layers" onto any plot. So creating a good visualization is a matter of many small steps, with opportunity for experimentation as to what features provide benefit, and which settings of of those are most desireable.

We are going to look at just a few capabilities of basic R graphics. I'll mention a few packages that build on the basics, but have neither the time nor the expertise to get into them today.

page 1: commands work as advertised 
> plot(pressure) 
('pressure' provides Vapor Pressure of Mercury as a Function of Temperature  and is included in the R package 'datasets', part of the basic installation. We'll look at the list of included data sets. )
> text(150,600, 'Pressure(mm Hg) \n versus \n Temperature (Celsius)')
(The 'text' command will overlay the graph's description at the specified location.)

page 85: Some nice examples of how layers can be added to an existing plot. > x <- 1:10 > y <- matrix(sort(rnorm(30)),ncol=3) (The "sort" command will sort the 30 numbers in ascending order, by default. See help(sort) for more information.) > plot(x,y[,1], ylim=range(y),ann=FALSE,axes=FALSE,type='l', col='grey') (The "plot" command opens a plot window and then constructs the requested plot. Here, R is told to plot the lines connecting the 10 (x,y) points in order. See help(plot) for more information.) > box(col='grey') ('box' constructs a bounding box. The following commands add the points and two more lines to the plot, one for each column of data.) > lines(x,y[,2],col='green') > points(x,y[,1]) > points(x,y[,2], pch=2) > lines(x,y[,3],col='red') > points(x,y[,3], pch=3)

The following commands declare and plot 4 points. Then a fifth point and a bounding box are added. > x <- c(4,5,2,1) > y <- x > plot(x,y, ann=FALSE, axes=FALSE, col='grey', pch=16) > points(3,3, col='green', pch=15) > box(col='grey')

Then text is added to each point in a slightly different way. > text(x,y, c('bottom', 'left', 'top', 'right'), pos=1:4) > text(3,3, 'overlay')


The following commands randomly choose 1000 x and y values from a standard normal distribution and plot the points. I'd suggest that you ask for the help page for 'rnorm': > help(rnorm). One key point is that almost all values in the standard normal distribution lie between -4 and 4, so that should be the range of both the x and y values. > x <- rnorm(1000) > y <- rnorm(1000) > plot(x,y, ann=FALSE, axes=FALSE, col='grey')

Finally, construct a bounding rectangle about the data and the convex hull of the data. I've added a simple grid in the background. > rect(min(x),min(y),max(x),max(y), lty='dashed') > hull <- chull(x,y) > polygon(x[hull],y[hull]) > grid()

R is able to include not only text on a plot, but mathematical formulae, as well. If you want to learn more about mathematical annotation in R, try help(plotmath) and example(plotmath). > plot(1:10,1:10, ann=FALSE, axes=FALSE, col='grey') > text(5,5, expression(paste("Temperature (", degree, "C) in 2003"))) > text(5,8, expression(bar(x)==sum(frac(x[i],n),i==1,n))) > text(5,6.5, expression(hat(beta)==(X^t*X)^{-1}*X^t*y)) > text(5,3.5, expression(z[i]==sqrt(x[i]^2+y[i]^2)))

R often uses a data structure designated "data frame" which organizes information by columns. The data set "CO2" provides results from an experiment which measured uptake of carbon dioxide by certain grasses under certain conditions.

> help(data.frame) > is.data.frame(CO2) [1] TRUE > coplot(uptake~conc | Plant, data=CO2,type='b', show.given=FALSE) > coplot(uptake~conc | Plant*Treatment, data=CO2,type='b', show.given=FALSE)


Assignment: Find the list of data sets available in the basic R installation: > data()

Construct an R visualization (perhaps more than one, but on same page) that tells the story of either ChickWeight (4 different diets) OR co2 (a time-series data set)

Be sure to document the process that led you to your visualization, including the final code used. I'm particularly interested in how many layers the visualization required.


Most would say that the most difficult thing about learning R results from R being open source software. There is a great deal of documentation for R, but it varies in usefulness. The key is to make regular use of the "help()" command, and to google various phrases: for example, "ChickWeight" yields a few R-related pages, including some interesting plot suggestions. The R command help(ChickWeight) brings up the page describing that data set.


> help(plot) > methods(plot)

[1] plot.acf*           plot.data.frame*    plot.decomposed.ts*
[4] plot.default        plot.dendrogram*    plot.density       
[7] plot.ecdf           plot.factor*        plot.formula*      

[10] plot.function plot.hclust* plot.histogram* [13] plot.HoltWinters* plot.isoreg* plot.lm [16] plot.medpolish* plot.mlm plot.ppr* [19] plot.prcomp* plot.princomp* plot.profile.nls* [22] plot.spec plot.stepfun plot.stl* [25] plot.table* plot.ts plot.tskernel* [28] plot.TukeyHSD

  Non-visible functions are asterisked

> help(plot.ts) > help(ts) > is.ts(co2) [1] TRUE > is.data.frame(ChickWeight) [1] TRUE > help(coplot)