Monday, February 27, 2017

Inputs and outputs

Inputting data might be one of the most important things to do in R. Without it, no analyses or visualizations can be performed. It seems like a lot of tutorials online on basic introduction skip this step in favor of randomly generated data.

Here is my full code for this week's assignment, my thoughts will follow it.

install.packages("plyr")

library("plyr", lib.loc="C:/Program Files/R/R-3.2.3/library")


x <- read.table(file.choose(),header=TRUE,sep=",") #AWESOME!

x #make sure it works

y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))

y #check it

write.table(y,"Sorted_Average") #where does it put the file? Documents

write.table(y,"Sorted_Average", sep=",")

newx <- subset(x,grepl("[iI]",x$Name))

write.table(newx,"DataSubset",sep=",")


I first installed the package plyr, for which we'll be using some functions.

The read.table and file.choose combination was pretty awesome. I usually set a directory (via the button in Rstudio and then do read.csv or read.table. This code eliminates a step, which is pretty cool.

I found the ddply function interesting. I liked sorting the data by a certain factor, I'm not sure how I feel about the average for that factor getting added to every line, that felt a little bit "busy". I was confused at first how it computed an average for one value, then I realized they were all the same per male and female.

I wasn't sure at first it my write.table worked. I didn't get an error, but nothing happened. Then I noticed a file with the name Sorted_Average in the files tab, and had to track down where it went by searching the name in my computer's directory. Maybe the "change directory" still has some usage for this purpose.

No comments:

Post a Comment