### Ecol592: Introduction to R ### Assignment 3 "Answer" Key ### Due: April 1, 2014 ### Standard disclaimer: There are lots of ways to do anything in R, so these "answers" represent my approach, even though many of you came up with alternative (and sometimes much better) solutions. ### For numbers 1 through 3, use the Orange dataset which is built into R and describes growth (in circumference) of orange trees through time head(Orange) dim(Orange) #1) Use aggregate() to figure out what the mean circumference of orange trees is for each age. aggregate(circumference ~ age, data=Orange, FUN=mean) #2) Why is aggregate(circumference ~ Tree + age, data=Orange, FUN=sum) a trivial operation? Each tree and age combination represents a unique row already so there is no summarization occuring. This command will essentially just return the same data frame that you passed to it. #3) Plot the circumference of each tree as a function of age. Build a regression line through these data (all of the data, not a line for each tree) that only spans the extent of the independent variable (age) using predict(). [Subquestion challenge: plot each tree's data in a different color without subsetting each group and plotting them separately. That is, you can do this in one line using plot() if you're clever about it.] # Basic plot plot(Orange$age, Orange$circumference, pch=19) model <- lm(circumference ~ age, data=Orange) summary(model) Orange.predictions <- predict(model) x.vector <- Orange$age lines(x.vector, Orange.predictions) # Trees in different colors plot(Orange$age, Orange$circumference, pch=19, col=Orange$Tree) lines(x.vector, Orange.predictions) #4) Add a title, x label, and y label to your plot plot(Orange$age, Orange$circumference, pch=19, col=Orange$Tree, main="Growth of Orange Trees Through Time", xlab="Age (days)", ylab="Circumference (cm)") lines(x.vector, Orange.predictions) ### For #5, use the 'faithful' dataset which is built into R and describes the activity of Old Faithful geyser in Yellowstone National Park. The length of the waiting time until the next eruption in minutes is predicted by the duration of the current eruption in minutes. #5) Plot these data, and add appropriate titles and axis labels. head(faithful) # Basic plot plot(faithful$eruptions, faithful$waiting, pch=19) plot(faithful$eruptions, faithful$waiting, pch=19, main="Old Faithful Eruption Waiting Times by Length of Current Eruption", xlab="Length of current eruption (mins)", ylab="Waiting Time (mins)") ### Challenge Questions #6) Suppose any Old Faithful eruption whose duration is longer than 3 minutes is considered "long" and any eruption whose duration is shorter than 3 minutes is considered "short." Draw a vertical line on your plot from #5 to demarcate this threshold. plot(faithful$eruptions, faithful$waiting, pch=19, main="Old Faithful Eruption Waiting Times by Length of Current Eruption", xlab="Length of current eruption (mins)", ylab="Waiting Time (mins)") abline(v=3, col="red", lty="dashed", lwd=3) #7) Plot "long duration" eruptions in a different color than "short duration" eruptions. Add a legend to describe your plot. plot(faithful$eruptions, faithful$waiting, type="n", main="Old Faithful Eruption Waiting Times by Length of Current Eruption", xlab="Length of current eruption (mins)", ylab="Waiting Time (mins)") points(faithful$eruptions[faithful$eruptions <= 3], faithful$waiting[faithful$eruptions <= 3], pch=19, col="blue") points(faithful$eruptions[faithful$eruptions > 3], faithful$waiting[faithful$eruptions > 3], pch=19, col="green") abline(v=3, col="red", lty="dashed", lwd=3) #8) Figure out how to save your 'faithful' plot to a .pdf file using the pdf() function, which takes a file path name as an argument. Did you set your working directory? Then you can just write the final file name instead of the whole path. Remember to include .pdf in your file name. file.choose() setwd("/Users/mikoontz/Desktop/ECOL592 Introduction to R/Assignments/ECOL592 Intro to R Assignment 3") # Name the file you want to save your plot to pdf("faithful MJK.pdf") plot(faithful$eruptions, faithful$waiting, type="n", main="Old Faithful Eruption Waiting Times by Length of Current Eruption", xlab="Length of current eruption (mins)", ylab="Waiting Time (mins)") points(faithful$eruptions[faithful$eruptions <= 3], faithful$waiting[faithful$eruptions <= 3], pch=19, col="blue") points(faithful$eruptions[faithful$eruptions > 3], faithful$waiting[faithful$eruptions > 3], pch=19, col="green") abline(v=3, col="red", lty="dashed", lwd=3) # Remember to turn the device off to complete the plot. This is when the plotting information is written to the file dev.off() # Your console should say "quartz 2" if you've done it correctly # Use the new.trees dataset. #9) Using aggregate(), figure out how many tree girths exceed 20 inches by species. # Find the file file.choose() new.trees <- read.csv("/Users/mikoontz/Desktop/ECOL592 Introduction to R/Lectures/ECOL592 Intro to R Lecture 7/trees comparison.csv") head(new.trees) aggregate(Girth ~ Species, data=new.trees, FUN=function(x) sum(x > 20)) # Or, if there might be NAs in your Girth column aggregate(Girth ~ Species, data=new.trees, FUN=function(x) length(which(x > 20))) #10) What does the with() function do? Allows you to access columns of data frames without having to specify the data frame name every time with(Orange, mean(circumference)) # Especially useful if you have lots of references to the same data frame with(Orange, plot(x=age, y=circumference, pch=19, col=Tree))