---
title: "Week 2 R Style and Programming"
date: "January 18, 2018"
output:
beamer_presentation:
theme: "PaloAlto"
fonttheme: "structuresmallcapsserif"
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
knitr::knit_hooks$set(mysize = function(before, options, envir) {
if (before)
return(options$size)
})
```
# R Programming Style
## Google's R Style Guide
While there is not universal agreement on programming style, we will adhere to the concepts in the Google R Style Guide: [https://google.github.io/styleguide/Rguide.xml](https://google.github.io/styleguide/Rguide.xml)
## Notation and Naming
- **File Names:** File names should end in .R and be meaningful.
- Good: predict_ad_revenue.R
- Bad: foo.R
- **Identifiers:** Don't use underscores or hypens in identifiers.
- The preferred form for variable names is all lower case letters with words separated with dots (`variable.name`), but `VariableName` is also accepted.
- Function names begin with capital letters and include no dots (`FunctionName`)
## Syntax
- **Spacing:**
- Place spaces around all operators (`==, +, ...`)
- Do not place a space before a comma, but always place one after a comma.
- Place a space before left parenthesis, except in a function call.
- **Assignment:**
- Use `<-` not `=` for assignment.
## Operators in R
- Most mathematical operators are self explanatory, but here are a few more important operators.
- `==` will test for equality. For example to determine if pi equals three, this can be evaluated with `pi == 3` in R and will return `r pi == 3 `. Note this operator returns a logical value.
- `&` is the AND operator, so `TRUE & FALSE` will return `r TRUE & FALSE`.
- `|` is the OR operator, so `TRUE | FALSE` will return `r TRUE | FALSE`.
- `!` is the NOT operator, so `! TRUE` will return `r ! TRUE`.
- `^` permits power terms, so `4 ^ 2` returns `r 4^2` and `4 ^ .5` returns `r 4 ^ .5`.
## Exercise: Order of operations
Note that order of operations is important in writing R code.
```{r opt,eval=F}
4 - 2 ^ 2
(4 - 2) ^ 2
5 * 2 - 3 ^ 2
! TRUE & pi == 3
! (TRUE | FALSE)
```
Evaluate all four expressions. Note `!` is R's not operator.
## Solution: Order of operations
The results of the R code are:
```{r opt2, , mysize=TRUE,size = '\\scriptsize'}
4 - 2 ^ 2
(4 - 2) ^ 2
5 * 2 - 3 ^ 2
! TRUE & FALSE
! (TRUE | FALSE)
```
## Organization: Layout
- **General Layout:**
The general layout of an R script should follow as:
1. Author Comment
2. File description comment, including purpose of program, inputs, and outputs
3. `source()` and `library()` statements
4. Function definitions
5. Executed statements
## Organization: Commenting
- Comment your code. Entire commented lines should begin with `#` and then one space.
- Short comments can be placed after code preceded by two spaces, `#` and then one space.
```{r noeval, eval=FALSE}
# create plot of housing price by zipcode
plot(Seattle$Price ~ Seattle$Zip,
rgb(.5,0,0,.7), # set transparency for points
xlab='zipode')
```
## Organization: Functions
Functions should contain a comments section immediately below the function definition line. These comments should consist of
1. a one-sentence description;
2. a list of the functions arguments, denoted by `Args:`, with a description of each and
3. a description of the return value, denoted by `Returns:`.
The comments should be descriptive enough that the function can be used without reading the function code.
## Overview Functions in R
Functions are a way to save elements of code to be used repeatedly.
```{r}
RollDice <- function(num.rolls){
#
# ARGS:
# RETURNS:
return(sample(6, num.rolls, replace = T))
}
RollDice(2)
```
## Exercise: Function Descriptions
Document this function with
1. a description,
2. summary of input(s)
3. summary of outputs
```{r}
RollDice <- function(num.rolls){
#
# ARGS:
# RETURNS:
return(sample(6, num.rolls, replace = T))
}
```
Note for help with functions in R, type `?sample`.
## Solution: Function Descriptions
```{r}
RollDice <- function(num.rolls){
# function that returns rolls of dice
# ARGS: num.rolls - number of rolls
# RETURNS: vector of num.rolls of a die
return(sample(6, num.rolls, replace = T))
}
RollDice(2)
```
## Parting Words
- Use common sense and *be consistent*.
- If you are editing code, take a few minutes to look at the code around you and mimic the style.
- Enough about writing code; the code itself is much more interesting. Have fun!
# Built in R Functions
## R Functions
- We have seen several standard functions in R. To get more details in R, type `?FunctionName`. This will open up a help window that displays essential characteristics of the function including arguments and values returned. For example, with the `mean` function the following information is shown:
**Description**: function for the (trimmed) arithmetic mean.
**Usage**: mean(x, trim = 0, na.rm = FALSE, ...)
**x**: An R object.
**trim:** the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed.
**na.rm:** a logical value indicating whether NA values should be stripped before the computation proceeds.
## Downloading R Packages
- R has a set of built in functions, which we have used thus far.
- R also has a vast repository of "packages" that contain additional, specialized functions. One example is a graphics packaged called `ggplot2` which we will see later in this class.
- Using these external packages requires two steps:
1. Download the package `install.packages('ggplot2')`. This only needs to be done once.
2. Load the package `library(ggplot2)`. This needs to be done when opening R.
# Writing R Functions
## Exercise: Writing and Documenting a Function
Use the defined style guidelines to create an R script that:
1. Takes a state abbreviations as an input
2. Imports a file available at: [http://math.montana.edu/ahoegh/teaching/stat408/datasets/HousingSales.csv](http://math.montana.edu/ahoegh/teaching/stat408/datasets/HousingSales.csv)
3. Creates a subset of housing sales from that state.
4. Returns a vector with the mean closing price in that state.
Verify your functions works by running it twice using "MT" and "NE" as inputs.
## Solution: Writing and Documenting a Function
```{r, mysize=TRUE, size='\\tiny'}
SummarizeHousingCosts <- function(state){
# computes average sales price in a state
# ARGS: state abbr, such as 'MT' or 'CA'
# RETURNS: vector with average sales price that each state
housing.data <- read.csv(
'http://math.montana.edu/ahoegh/teaching/stat408/datasets/HousingSales.csv')
location <- subset(housing.data, State == state)
mean.price <- mean(location$Closing_Price)
return(mean.price)
}
```
```{r, mysize=TRUE, size='\\footnotesize'}
SummarizeHousingCosts('MT')
SummarizeHousingCosts('NE')
```
## Format of an R function
Here is an example (trivial) R function.
```{r func.def}
SquareRoot <- function(value.in){
# function takes square root of value.
# Args: value.in - numeric value
# Returns: the square root of value.in
return(value.in ^ .5)
}
```
## Square Root Function
Now consider running the function for a few values.
```{r evalSQ}
SquareRoot(9)
SquareRoot(25)
```
Now what happens with `SquareRoot(-1)`?
## Square Root Function
```{r neqSQ}
SquareRoot(-1)
```
What should happen?
## Errors in R functions
Here is an example (trivial) R function.
```{r func.def.new}
SquareRoot <- function(value.in){
# function takes square root of value.
# Args: value.in - numeric value
# Returns: the square root of value.in
if (value.in < 0) stop('argument less than zero')
return(value.in ^ .5)
}
```
## Square Root Function
```{r neqSQ.new,eval=FALSE}
SquareRoot(-1)
```
This returns:
```{r error, eval=FALSE}
> SquareRoot(-1)
Error in SquareRoot(-1) :
argument less than zero
```
## Exercise: Functions Part 2
Now write a function that;
1. Takes daily snowfall total in inches as input
2. Takes day of week as input
3. Returns whether to ski or stay home.
Also include and the `stop()` function for errors.
Test this function with two settings:
- snowfall = 15, day = "Sat"
- snowfall = -1, day = "Mon"
## Solution: Functions Part 2
```{r}
ToSki <- function(snowfall, day){
# determines whether to ski or stay home
# ARGS: snowfall in inches, day as three letter
# abbrwith first letter capitalized
# RETURNS: string stating whether to ski or not
if (snowfall < 0) stop('snowfall should be greater
than or equal to zero inches')
if (day == 'Sat') {
return('Go Ski')
} else if (snowfall > 5) {
return('Go Ski')
} else return('Stay Home')
}
```
## Solution: Functions Part 2 cont..
```{r, error = T, mysize=TRUE, size='\\small'}
ToSki(snowfall = 15, day = "Sat")
ToSki(-1, 'Mon')
```
# Matrix Style Operations
## colMeans, rowSums
R contains a set of built in functions for taking the mean and sum of matrices that have been optimized for speed.
```{r colMeans}
mat1 <- matrix(1:4,ncol=2,nrow=2)
rowMeans(mat1)
colSums(mat1)
```
## Apply
For generic functions, the set of apply commands are extremely useful. They provide a mechanism for matrix style operations similar to the built in `rowMeans()` type of functions.
The apply function has three arguments:
- matrix
- margin (rows=1, columns=2)
- function
```{r apply}
apply(mat1,2,mean)
```
## Aggregate
Another useful function is `aggregate` which can be used to compute summary statistics of dataset by a particular group. Aggregate also has three essential elements.
1. An R object
2. list of groups
3. function
```{r agg}
aggregate(Loblolly$height,by=list(Loblolly$age),mean)
```
## Exercise: Aggregate
Earlier we wrote a function to compute the average housing price for two states, now use aggregate to compute this for all the states in the housing data set.
## Solution: Aggregate
```{r, mysize=TRUE, size='\\tiny'}
housing.prices <- read.csv(
'http://math.montana.edu/ahoegh/teaching/stat408/datasets/HousingSales.csv',
stringsAsFactors = F)
housing.prices.state <- aggregate(housing.prices$Closing_Price,
by=list(housing.prices$State),mean)
head(housing.prices.state)
```
# Tables for R Markdown
## Kable function
Note that output from R can often be hard to read. Luckily there are several options for creating nicely formatted tables. One, which we will use, is the kable function.
## Kable function
```{r kable, mysize=TRUE, size='\\small'}
library(knitr)
kable(aggregate(Loblolly$height,
by=list(Loblolly$age),mean), digits=3,
caption= 'Average height of loblolly pine by age',
col.names = c('Tree Age','Height (ft)'))
```