Wednesday, March 22, 2017

Debugging

Our assignment this week was to find a bug in a function with deliberate bugs.

The function was

tukey_multiple <- function(x) { 
  
outliers <- array(TRUE,dim=dim(x)) 
   for (j in 1:ncol(x)) 
    { 
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j]) 
    } 
outlier.vec <- vector(l
ength=nrow(x)) 
    for (i in 1:nrow(x)) 
    { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }



Running it as such reports an error in

Error: unexpected symbol in:
"for (i in 1:nrow(x))
{ outlier.vec[i] <- all(outliers[i,]) } return"

I restructured the last few lines so they more closely matched the top

tukey_multiple <- function(x) { 
  outliers <- array(TRUE,dim=dim(x)) 
  for (j in 1:ncol(x)) 
  { 
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j]) 
  } 
  outlier.vec <- vector(length=nrow(x)) 
  for (i in 1:nrow(x)) 
  { 
    outlier.vec[i] <- all(outliers[i,]) 
  } 
  return(outlier.vec) 
}

The function then was properly created. However, when applying it to a matrix

tukey_multiple(matrix( rnorm(5*5,mean=0,sd=1), 5, 5))
Error: could not find function "tukey.outlier"

tukey.outlier isn't actually a function. I'm not sure if this was deliberate, or if it's wrapped up in a package I don't have. Google didn't reveal anything useful for a search on "tukey.outlier R". So I proceeded to use some of R's debugging.

> debug(tukey_multiple)
> tukey_multiple(matrix( rnorm(5*5,mean=0,sd=1), 5, 5))
debugging in: tukey_multiple(matrix(rnorm(5 * 5, mean = 0, sd = 1), 5, 5))
debug at #1: {
    outliers <- array(TRUE, dim = dim(x))
    for (j in 1:ncol(x)) {
        outliers[, j] <- outliers[, j] && tukey.outlier(x[, j])
    }
    outlier.vec <- vector(length = nrow(x))
    for (i in 1:nrow(x)) {
        outlier.vec[i] <- all(outliers[i, ])
    }
    return(outlier.vec)
}
Browse[2]>

Not sure why mine says Browse[2] and the lecture does Browse[1], and I'm not totally certain how to proceed from here. However, if this missing function was found it seems the function should work.

Friday, March 10, 2017

Visualization and graphics

This is a topic on which I have struggled in the past. I can do the basics, but I know there is a lot more to it than I have yet to unlock. It seems many in biology who favor R like it for the statistical capabilities, but not so much the graphical aspect. I hope to not fall into that rut.

For these examples I used the "biomass" dataset on https://vincentarelbundock.github.io/Rdatasets/datasets.html

I kept each different plotting device as a function of "Tas" and "year" so I can really explore the base functions of each. Github code found here https://github.com/jcrichardson617/R_class/blob/master/module9

Using R's built in plot() function, the graph looks fairly simple.

Using the lattice package:

And ggplot2:

Having each be very basic like this provided little difficulties, and there is of course room for improvement on each. As the instructor of the course is big into visualization, and talks about ggplot a lot, I am inclined to lean that way. Even from these basic plots though, it seems like ggplot used more code than lattice and plot, so it probably has a steep learning curve.