Richardson R blog LIS5937

Wednesday, March 22, 2017

Debugging

Our assignment this week was to find a bug in a function with deliberate bugs.

The function was

tukey_multiple <- function(x) {
   outliers <- array(TRUE,dim=dim(x))
   for (j in 1:ncol(x))
    {
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])
    }
outlier.vec <- vector(length=nrow(x))
    for (i in 1:nrow(x))     { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }

Running it as such reports an error in

Error: unexpected symbol in:
"for (i in 1:nrow(x))
{ outlier.vec[i] <- all(outliers[i,]) } return"

I restructured the last few lines so they more closely matched the top

tukey_multiple <- function(x) {
outliers <- array(TRUE,dim=dim(x))
for (j in 1:ncol(x))
{
outliers[,j] <- outliers[,j] && tukey.outlier(x[,j])
}
outlier.vec <- vector(length=nrow(x))
for (i in 1:nrow(x))
{
outlier.vec[i] <- all(outliers[i,])
}
return(outlier.vec)
}

The function then was properly created. However, when applying it to a matrix

tukey_multiple(matrix( rnorm(5*5,mean=0,sd=1), 5, 5))
Error: could not find function "tukey.outlier"

tukey.outlier isn't actually a function. I'm not sure if this was deliberate, or if it's wrapped up in a package I don't have. Google didn't reveal anything useful for a search on "tukey.outlier R". So I proceeded to use some of R's debugging.

> debug(tukey_multiple)
> tukey_multiple(matrix( rnorm(5*5,mean=0,sd=1), 5, 5))
debugging in: tukey_multiple(matrix(rnorm(5 * 5, mean = 0, sd = 1), 5, 5))
debug at #1: {
outliers <- array(TRUE, dim = dim(x))
for (j in 1:ncol(x)) {
outliers[, j] <- outliers[, j] && tukey.outlier(x[, j])
}
outlier.vec <- vector(length = nrow(x))
for (i in 1:nrow(x)) {
outlier.vec[i] <- all(outliers[i, ])
}
return(outlier.vec)
}
Browse[2]>

Not sure why mine says Browse[2] and the lecture does Browse[1], and I'm not totally certain how to proceed from here. However, if this missing function was found it seems the function should work.

Friday, March 10, 2017

Visualization and graphics

This is a topic on which I have struggled in the past. I can do the basics, but I know there is a lot more to it than I have yet to unlock. It seems many in biology who favor R like it for the statistical capabilities, but not so much the graphical aspect. I hope to not fall into that rut.

For these examples I used the "biomass" dataset on https://vincentarelbundock.github.io/Rdatasets/datasets.html

I kept each different plotting device as a function of "Tas" and "year" so I can really explore the base functions of each. Github code found here https://github.com/jcrichardson617/R_class/blob/master/module9

Using R's built in plot() function, the graph looks fairly simple.

Using the lattice package:

And ggplot2:

Having each be very basic like this provided little difficulties, and there is of course room for improvement on each. As the instructor of the course is big into visualization, and talks about ggplot a lot, I am inclined to lean that way. Even from these basic plots though, it seems like ggplot used more code than lattice and plot, so it probably has a steep learning curve.

Monday, February 27, 2017

Inputs and outputs

Inputting data might be one of the most important things to do in R. Without it, no analyses or visualizations can be performed. It seems like a lot of tutorials online on basic introduction skip this step in favor of randomly generated data.

Here is my full code for this week's assignment, my thoughts will follow it.

install.packages("plyr")

library("plyr", lib.loc="C:/Program Files/R/R-3.2.3/library")

x <- read.table(file.choose(),header=TRUE,sep=",") #AWESOME!

x #make sure it works

y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))

y #check it

write.table(y,"Sorted_Average") #where does it put the file? Documents

write.table(y,"Sorted_Average", sep=",")

newx <- subset(x,grepl("[iI]",x$Name))

write.table(newx,"DataSubset",sep=",")

I first installed the package plyr, for which we'll be using some functions.

The read.table and file.choose combination was pretty awesome. I usually set a directory (via the button in Rstudio and then do read.csv or read.table. This code eliminates a step, which is pretty cool.

I found the ddply function interesting. I liked sorting the data by a certain factor, I'm not sure how I feel about the average for that factor getting added to every line, that felt a little bit "busy". I was confused at first how it computed an average for one value, then I realized they were all the same per male and female.

I wasn't sure at first it my write.table worked. I didn't get an error, but nothing happened. Then I noticed a file with the name Sorted_Average in the files tab, and had to track down where it went by searching the name in my computer's directory. Maybe the "change directory" still has some usage for this purpose.

Friday, February 24, 2017

S3 and S4 classes

Greetings world (or just Dr. Friedman),

In R class this week we learned some more about R objects and programming with objects. We took a look at S3 and S4 classes, new terms to me. S3 is informal and has been in R since the beginning of the R language. S3 is simpler, not as rigorous, but is "not as safe" as S4. Class reassignment is more difficult in S4 as well. Most of the material this week was new to me. Although the concepts are not trivial, I think I have a hold on it (mostly!). I have enjoyed learning about classes and generic functions because I feel that I better understand how R works behind the scenes. I learned from the lecture and the text book how to use methods() and make my own generic function. However, I think I need a lot of practice on the latter. Hadley Wickham has an informative post in Advanced R, “OO R”.

I'll use a dataset on invasive iguanas to answer the question, which includes an arbitrarily assigned ID, collection date (M/D/Y format), sex, length, mass, and various other morphological features.

iguana <- read.csv(“iguana.csv”)

Using some functions, we can classify the data:

> isS4(iguana)
[1] FALSE

> mode(iguana)
[1] "list"

> typeof(iguana)
[1] "list"

> class(iguana)
[1] "data.frame"

> is.data.frame(iguana)

[1] TRUE

> ismatrix(iguana)
[1] FALSE

"iguana" is not S4 class, mode() and typeof() return list whereas class() returns data.frame. As it is a data frame, generic functions for this class will work.

I do not think I have seen an S4 object, as every dataset I have used is subset with "$" rather than "@". I will explore further.

Monday, February 13, 2017

Matrices

This week went very smoothly, working with matrices and doing some simple algebra with them. These are all built in function in R, most of which I had seen. I made up my own matrix values, as those suggested to use on Canvas do not work (since 6 does not divide evenly into 100 or 1000).

> A <- matrix(runif(100,1, 400), nrow = 10, ncol = 10)
> B <- matrix(runif(2500, 1, 500), nrow = 50, ncol = 50)

Taking the inverse requires only 1 letter...

> t(A)

The output will not format well in here, so I will skip pasting that in, calling t(B) returns the inverse for matrix B.

Determinants are easy as well, the same idea as a transverse;

> det(A)
[1] -8.837604e+23

> det(B)
[1] 8.667763e+139

The inverse calls a rather odd function called "solve". It seems to do a lot. I will have to explore it more to get a better feel for it.

> solve(A)

Again, this is a large output which will not appear well here.

Multiplying a matrix by a vector is also as simple as any other multiplication. First we need a vector with the same number of columns:

> a <- c(seq(11,20,1)
> a*A

> a*B

still works, even though "a" has 10 values and B has 50 columns. R will automatically loop the "a" vector to match the size of the "B" matrix.

That looks like all the calculation needed of us this week. Off to learn more about "solve"!

Monday, February 6, 2017

Boxplots and Histograms

This was a lucky week for me in R class. I have been doing some analyses on my own work lately, and as such have been looking at a lot of histograms and boxplots. The assignment was to create some of these exact things on blood pressures and doctors ratings of those.

Blood pressures were given in numbers, but ratings were presented as good/bad or low/high. I don't like working with words in R, it's terrifying, so I changed these to numbers instead. Highs and bads became 1's, while lows and goods became 1's.

Boxplots and histograms provide a simple but powerful way to explore data without delving into statistics too much. It is important to remember to perform these basic procedures even as our datasets become more advanced.

Figure 1. Frequency of hospital visitations for 10 patients.

Figure 2. Histogram of blood pressure for 10 patients.

Figure 3. Boxplot of blood pressures assessed by one doctor. 0 is "bad" and 1 is "good"

Figure 4. Boxplot of blood pressures assessed by a second doctor. 0 is "low" and 1 is "high"

Figure 5. Boxplot of blood pressures assessed by a third doctor. 0 is "low" and 1 is "high"

Friday, February 3, 2017

Functions

Functions are something I love to hate. They can be extremely frustrating, but it's a good feeling when they work well. Our assignment this week was to write a simple function. I wrote one to assess the pass/fail rate on a test, and return the percentage of passing students. Below, I present my workthrough to the final product, which can be found on github HERE

First we need some data.

x <- rnorm(250,75,22)

250 students took a test, the average was a 75, and the standard deviation was 22.

I then wrote a simple function to assess show which students passed and which failed.

passfail <- function(x) {
ifelse(x>=60,'pass','fail')
}

then I tested it out

passfail(x)
[1] "pass" "pass" "pass" "pass" "fail" "fail" "fail" "pass" "fail" "pass" "fail" "pass" "fail" "pass"
[15] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "fail"
[29] "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail"
[43] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail"
[57] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[71] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail" "pass" "pass"
[85] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[99] "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass"
[113] "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass"
[127] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "fail" "pass"
[141] "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass"
[155] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail" "pass"
[169] "pass" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[183] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "fail" "pass"
[197] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass"
[211] "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[225] "pass" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[239] "fail" "fail" "fail" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass"

Cool! It worked. However when looking at examples I noticed a low of functions had this "return" function written within. Let's just do what everyone else does and incorporate it.

passfail <- function(x) {
score <- ifelse(x>=60,'pass','fail')
print("score")
}
passfail(x)
[1] "score"

Oops! That's not what we want. Turns out "score" should not be parenthesized.

passfail <- function(x) {
score <- ifelse(x>=60,'pass','fail')
print(score)
}

passfail(x)
[1] "pass" "pass" "pass" "pass" "fail" "fail" "fail" "pass" "fail" "pass" "fail" "pass" "fail" "pass"
[15] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "fail"
[29] "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail"
[43] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail"
[57] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[71] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail" "pass" "pass"
[85] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[99] "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass"
[113] "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass"
[127] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "fail" "pass"
[141] "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass"
[155] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail" "pass"
[169] "pass" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[183] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "fail" "pass"
[197] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass"
[211] "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[225] "pass" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[239] "fail" "fail" "fail" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass"

Working again, good. It's printing the argument of 'score' rather than the word score, which is what parenthesis made it do.

This by itself is good, but why not see what percentage actually passes. I worked "percentage" in as a returned argument as well;

passfail <- function(x) {
score <- ifelse(x>=60,'pass','fail')
percentages <- ((length(which(score=='pass')))/(length(score)))*100
print(score)
print(percentages)
}

passfail(x)
[1] "pass" "pass" "pass" "pass" "fail" "fail" "fail" "pass" "fail" "pass" "fail" "pass" "fail" "pass"
[15] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "fail"
[29] "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail"
[43] "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail"
[57] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[71] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "fail" "pass" "pass"
[85] "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[99] "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass"
[113] "pass" "fail" "pass" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass"
[127] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "fail" "pass"
[141] "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass"
[155] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "fail" "fail" "pass"
[169] "pass" "fail" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[183] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "fail" "pass"
[197] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass"
[211] "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[225] "pass" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "pass" "fail" "pass" "fail" "pass" "pass"
[239] "fail" "fail" "fail" "fail" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass"

[1] 72.8

We can now see the full list of individual passes and fails, and the percentage of student's who passed.

The way it is now is fine perhaps, but it's a bit general. 60 points might be a good cutoff for most tests out of 100 points, but maybe a professor gives a very difficult test in which only 40 points are needed to pass. This next step marks the final product, and has an additional element set by the user, "y", which tells the function how many points equals a "pass". Let's try it out.

passfail <- function(x, y) {
score <- ifelse(x>=y,'pass','fail')
percentages <- ((length(which(score=='pass')))/(length(score)))*100
print(score)
print(percentages)
}

passfail(x, 40)

[1] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "fail" "pass" "pass" "pass"
[15] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[29] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[43] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[57] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[71] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[85] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[99] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass"
[113] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[127] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[141] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[155] "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[169] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[183] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[197] "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "fail" "pass" "pass" "pass" "pass"
[211] "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass"
[225] "pass" "pass" "pass" "pass" "pass" "pass" "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass"
[239] "fail" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass" "pass"

[1] 95.6