---
title: |
    | STAT 408 - Week 5:
    | R Miscellanea and Debugging R Code
date: "February 8, 2018"
output:
  beamer_presentation:
    theme: "PaloAlto"
    fonttheme: "structuresmallcapsserif"
---


```{r setup, include=FALSE}
library(knitr)
library(formatR)
library(XML)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE)
knitr::knit_hooks$set(mysize = function(before, options, envir) {
  if (before) 
    return(options$size)
})
```

## Course Goals

With this class, we cannot cover every possible situation that you will encounter. The goals are to:

1. Give you a broad range of tools that can be employed to manipulate, visualize, and analyze data, and
2. teach you to find help when you or your code "gets stuck".


# R Miscellanea - more tools


## Lists

We have used lists (some), but it is worth talking about them in more details.
Here are some questions to get started.

- Where have lists shown up in this course?
- How do we typically index elements in a list?
- What other functions have we used to manipulate lists?

## Exercise: Lists
Consider the two lists, write out what gets printed from R. 

```{r, eval=F, mysize=TRUE, size='\\tiny'}
msu.info <- list( name = c('Waded Cruzado','Andy Hoegh'), 
         degree.from = c('University of Texas at Arlington','Virginia Tech'),
         job.title = c('President', 'Assistant Professor of Statistics'))
msu.info

msu.info2 <- list(c('Waded Cruzado','University of Texas at Arlington',
                     'President'), c('Andy Hoegh',
                  'Virginia Tech','Assistant Professor of Statistics'))
msu.info2
```
\normalsize
What do all of those brackets mean?

## Solution: Lists

```{r, mysize=TRUE, size='\\tiny'}
msu.info <- list( name = c('Waded Cruzado','Andy Hoegh'), 
         degree.from = c('University of Texas at Arlington','Virginia Tech'),
         job.title = c('President', 'Assistant Professor of Statistics'))
msu.info

msu.info2 <- list(c('Waded Cruzado','University of Texas at Arlington',
                     'President'), c('Andy Hoegh',
                  'Virginia Tech','Assistant Professor of Statistics'))
msu.info2
```

## Lists - indexing
With the current lists we can index elements using the double bracket `[[ ]]` notation or if names have been initialized, those can be used too.

So the first element of each list can be indexed
```{r, mysize=TRUE, size='\\tiny'}
msu.info[[1]]
msu.info$name
```

## Exercise: Lists 
Explore the indexing with these commands.
```{r, mysize=TRUE, size='\\tiny', eval=F}
msu.info <- list( name = c('Waded Cruzado','Andy Hoegh'), 
         degree.from = c('University of Texas at Arlington','Virginia Tech'),
         job.title = c('President', 'Assistant Professor of Statistics'))
msu.info[1]
msu.info[[1]]
msu.info$name[2]
msu.info[1:2]
unlist(msu.info)
```

## Solution: Lists 1
```{r, mysize=TRUE, size='\\tiny'}
msu.info[1]
msu.info[[1]]
msu.info$name[2]
```

## Solution: Lists 2
```{r, mysize=TRUE, size='\\tiny'}
msu.info[1:2]
unlist(msu.info)
```


## Lists  - nested lists

```{r, mysize=TRUE, size='\\footnotesize'}

list(list('a','b'),list('c','d'))
```


## Arrays

Arrays are a general form a matrix, but have a higher dimension.

```{r, mysize=TRUE, size='\\tiny'}
array.1 <- array(1:8, dim=c(2,2,2)); array.1
array.1[2,2,1]
```

## Exercise: Arrays

Create an array of dimension 2 x 2 x 3, where each of the three 2 x 2 subarray (or matrix) is the Identity matrix.

## Solution: Arrays
Create an array of dimension 2 x 2 x 3, where each of the three 2 x 2 subarray (or matrix) is the Identity matrix.

```{r, mysize=TRUE, size='\\tiny'}
array(c(1,0,0,1), dim = c(2,2,3))
```

## Merge
Another important skill is merging or combining data sets.

Consider the two data frames, how can we merge them and what should be the dimensions of the merged data frame.
```{r, mysize=TRUE, size='\\scriptsize'}
df1 <- data.frame(school = c('MSU','VT','Mines'),
            state= c('MT','VA','CO'), stringsAsFactors = F)
df1
df2 <- data.frame(school = c('Mines','MSU','VT'),
            enrollment = c(5794,15688,30598), stringsAsFactors = F)
df2
```

## sort() and order()
One possibility is to use the `sort()` / `order()` functionality as a first step.

```{r, mysize=TRUE, size='\\tiny'}
order(df1$school)
order(df2$school)
df1 <- df1[order(df1$school),]
df1
df2 <- df2[order(df2$school),]
df2
```


## rbind() and cbind()

Now, given that the data frames are both sorted the same way, we can bind the rows together.

```{r}
comb.df <- cbind(df1,df2)
comb.df
comb.df <- comb.df[,-3]
```

## rbind() and cbind()
Now assume we want to add another school to the data frame.

```{r, error=TRUE}
new.school <- c('Luther', 'IA',2337)
rbind(comb.df, new.school)
```
Note: if your strings are saved as factors, this chunk of code will give you an error.

## join()
We could have also used some of the more advanced merge (join) features from dplyr.

```{r}
library(dplyr)
new.df <- full_join(df1,df2, by='school')
new.df
```

## Exercise: merging

Combine the two data sets
```{r, mysize=TRUE, size='\\tiny'}
df.cost <- data.frame( ski.resort = c('Bridger Bowl', 'Big Sky', 'Steamboat', 'Jackson'), 
                       ticket.cost = c(60, 'depends',145, 130))
df.acres <- data.frame( ski.hill = c('Bridger Bowl', 'Jackson', 'Steamboat', 'Big Sky'), 
                        skiable.acres = c(2000, "2500+",2965, 5800))

```

## Solution: merging

Combine the two data sets
```{r, mysize=TRUE, size='\\tiny'}
df.cost <- data.frame( ski.resort = c('Bridger Bowl', 'Big Sky', 'Steamboat', 'Jackson'),
                       ticket.cost = c(60, 'depends',145, 130))
df.acres <- data.frame( ski.hill = c('Bridger Bowl', 'Jackson', 'Steamboat', 'Big Sky'), 
                        skiable.acres = c(2000, "2500+",2965, 5800))

kable(full_join(df.cost, df.acres, by = c('ski.resort' = 'ski.hill')))
```

# Debugging R code

## Process for writing code

When writing code (and conducting statistical analyses) an iterative approach is a good strategy.

1. Test each line of code as you write it and if necessary confirm that nested functions are giving the desired results.
2. Start simple and then add more complexity.

## Debugging Overview

> Finding your bug is a process of confirming the many things that you believe are true -- until you find one which is not true. 
- Norm Matloff

## Debugging Guide 
We will first focus on debugging when an error, or warning is tripped.

1. Realize you have a bug (if error or warning, read the message)
2. Make it repeatable
3. Identify the problematic line (using print statements can be helpful)
4. Fix it and test it (evaluate nested functions if necessary)

## Warnings vs. Errors

R will flag, print out a message, in two cases: warnings and errors.

 - What is the difference between the two?
 - Is the R process treated differently for errors and warnings?
 
## Warnings vs. Errors

- Fatal errors are signaled with `stop()` and force all execution of code to stop triggering an `error`.
- Warnings are generated with `warning()` and display potential problems. Warnings **do not** stop code from executing.
- Messages can also be passed using `message()`, which pass along information.

## Bugs without warning/error

In other cases, we will have bugs in our code that don't necessarily give a warning or an error.

- How do we identify these bugs?
- How can we exit a case where:
    - R is running and may be stuck?
    - the code won't execute because of misaligned parenthesis, braces, brackets?

Note: `NA` values often return a warning message, but not always.

## Exercise: Debugging a Warning

Fix the script that determines if each item in a sequence is less than zero.

```{r, mysize=TRUE, size='\\tiny'}
val.in <- seq(-1,1,by=.25)

if (val.in < 0){
  print(paste(val.in, 'less than 0'))
}
```

## Solution: Debugging a Warning

```{r, mysize=TRUE, size='\\tiny'}
val.in <- seq(-1,1,by=.25)

ifelse(val.in < 0,paste(val.in, 'less than 0'),paste(val.in, 'greater than (equal to) 0'))
```


## Exercise: Debugging an Error
Identify the issue(s) with this function
```{r, error=TRUE, eval=FALSE, mysize=TRUE, size='\\footnotesize'}
MergeData <- function(data1, data2, key1, key2){
  # function to merge two data sets
  # Args: data1 - first dataset
  #       data2 - second dataset
  #       key1 - key name in first dataset
  #       key2 - key name in second dataset
 # Returns: merged dataframe if key matches, 
  #          otherwise print an error  
  if (key1 = key2){
    data.out <- join(data1,data2, by = key1)
    return(dataout)
  } else {
    stop('keys are not the same')
  }
}
```

## Solution: Debugging an Error
### Step 1 - fix '='
```{r, error=TRUE, eval=FALSE, mysize=TRUE, size='\\footnotesize'}
MergeData <- function(data1, data2, key1, key2){
  # function to merge two data sets
  # Args: data1 - first dataset
  #       data2 - second dataset
  #       key1 - key name in first dataset
  #       key2 - key name in second dataset
  # Returns: merged dataframe if key matches, 
  #          otherwise print an error
  if (key1 == key2){
    data.out <- join(data1,data2, by = key1)
    return(dataout)
  } else {
    stop('keys are not the same')
  }
}
```

## Solution: Debugging an Error
### Step 2 - load dplyr() & use full_join
```{r, error=TRUE, eval=FALSE, mysize=TRUE, size='\\footnotesize'}
MergeData <- function(data1, data2, key1, key2){
  # function to merge two data sets
  # Args: data1 - first dataset
  #       data2 - second dataset
  #       key1 - key name in first dataset
  #       key2 - key name in second dataset
  # Returns: merged dataframe
  library(dplyr)
  if (key1 == key2){
    data.out <- full_join(data1,data2, by = key1)
    return(dataout)
  } else {
    stop('keys are not the same')
  }
}
MergeData(df1,df2,"school","school")
```

## Solution: Debugging an Error
### Step 3 - correct dataout to data.out
```{r, error=TRUE, mysize=TRUE, size='\\footnotesize'}
MergeData <- function(data1, data2, key1, key2){
  # function to merge two data sets
  # Args: data1 - first dataset
  #       data2 - second dataset
  #       key1 - key name in first dataset
  #       key2 - key name in second dataset
  # Returns: merged dataframe
  library(dplyr)
  if (key1 == key2){
    data.out <- full_join(data1,data2, by = key1)
    return(data.out)
  } else {
    stop('keys are not the same')
  }
}
```

## Solution: Debugging an Error
### Step 3 - correct dataout to data.out
```{r, error=TRUE, mysize=TRUE, size='\\footnotesize'}

MergeData(df1,df2,"school","school")
MergeData(df.cost,df.acres, 'ski.resort','ski.hill')
```

# Advanced Debugging

## Overview

We can often fix bugs using the ideas sketched out previously and this becomes *easier* with more experience coding in R. Trial and error can be very effective and strategic use of print function help to identify where bugs are occuring.

However, R does also have advanced tools to help with debugging code.

- `traceback()`
-  "Rerun with debug"
-  `browser()`

## traceback()
Consider the following code:

```{r, error=TRUE, mysize=TRUE, size='\\small'}
f <- function(a) g(a) 
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d)  "a" + d
f(10)
```

## traceback()
Consider the `traceback()` function. Which identifies which functions have been executed (along with the row number of the function).

```{r,eval=FALSE}
> traceback()

4: i(c) at #1
3: h(b) at #1
2: g(a) at #1
1: f(10)
```
Note: due to the way that R Markdown is compiled, `traceback()` needs to be run directly in R, not R Markdown.

## Browsing on an error

Another option (in R Studio) is to browse on the error. This gives you an interactive way to move through the function calls to identify the problem of the location. This can also be called explicitly using `debug()`.

![](images/debug.png)
 
## browser()
 
The browser function can also be used to interactively step through a function.

```{r}
SS <- function(mu, x) {
  browser()
  d <- x - mu
  d2 <- d^2
  ss <- sum(d2)
  ss
}

```

 
## browser() step 1
 
![](images/browse1.png) 

## browser() step 2
 
![](images/browse2.png) 
 
## browser() step 3
 
![](images/browse3.png)  

## browser() step 4
 
![](images/browse4.png) 

## browser() step 5
 
![](images/browse5.png)