Please turn in the exam to D2L and include both the R Markdown code and either a Word or PDF file. Please verify that all of the code has compiled and the graphics look like you think they should on your Word or PDF file. If you are having issues with the Word file contorting your images, you may submit the image files directly to D2L as well.

While the exam is open book, meaning you are free to use any resources from class, this is strictly an individual endeavor. Discussing the problems with anyone outside the course instructor is a violation of the MSU student conduct code. The instructor will answer questions related to expectations or understanding of the exam, but will not fix or troubleshoot broken code.

1. (18 points)

The Rubik’s cube was a popular game in the 1980’s which players attempt to match colors on the six sides of a cube. To win the game, a player must have a single color on each side of the cube. For additional details see the wikipedia page: https://en.wikipedia.org/wiki/Rubiks_Cube.

Write a function that:

Verify your function by testing it on the following arrays. Each call should either return ‘This cube is a completed Rubik’s cube’, ‘Try Again: cube is not a completed Rubik’s cube’, or an error. For full credit errors must be returned for all improper inputs. Even if you cannot complete the entire problem, include the code you have for consideration of partial credit.

RubiksCube <- function(cube) {
  # function takes a 3x3x6 array of colors and determines if a proper Rubik's cube with 9 (3x3) elements of each of 6 colors (red, green, yellow, blue, white, orange) has been entered
  # ARGS: cube - a 3x3x6 array of character value of colors
  # RETURNS: message indicating that the array is either a complete cube or error indicating there is the array is not a complete cube
  
  # Verify correct dimensions
  if (!all( dim(cube)[1] == 3 & dim(cube)[2] == 3 & dim(cube)[3] == 6)) { 
    stop('incorrect dimensions')
  }

  # Verify cube contains character strings
  if (!typeof(cube) == "character") {
    stop("values are not character strings")
  }
  
  # verify all colors are present 
  if (! sum(unique(as.character(cube)) %in% c('blue','green','orange','red','white','yellow')) == 6){ 
    stop('all 6 required colors are not present')
  }
  
  # verify all colors show up 9 times
  if (!all(aggregate(rep(1,3*3*6),list(as.character(cube)),sum)[,2] == 9)){
    return("try again: cube is not a completed Rubik's cube")
  }

    # verify that each color shows up once per side
  num.unique.per.side <- rep(0,6)
  for (i in 1:6){
    num.unique.per.side[i] <- length(unique(as.character(cube[,,i])))
  }
  
  if (all(num.unique.per.side == 1)){
    return("This cube is a completed Rubik's cube")
  } else {
    return("try again: cube is not a completed Rubik's cube")
  }
}
array1 <- array('red', dim=c(3,3,6))

array2 <- array(1:6, dim=c(3,3,6))

array3 <- array(c('red','blue','green','yellow','white','orange'), dim=c(3,3,6))

array4 <- array(c('red','blue','green','yellow','white','orange'), dim=c(3,3,3))

array5 <- array(rep(c('red','blue','green','yellow','white','orange'),each=9), dim=c(3,3,6))

array6 <- array(rep(c('duck','goose','eagle','crane','white','orange'),each=9), dim=c(3,3,6))

RubiksCube(array1)
## Error in RubiksCube(array1): all 6 required colors are not present
RubiksCube(array2)
## Error in RubiksCube(array2): values are not character strings
RubiksCube(array3)
## [1] "try again: cube is not a completed Rubik's cube"
RubiksCube(array4)
## Error in RubiksCube(array4): incorrect dimensions
RubiksCube(array5)
## [1] "This cube is a completed Rubik's cube"
RubiksCube(array6)
## Error in RubiksCube(array6): all 6 required colors are not present

3. (22 points)

For this question use the Baltimore Towing data set:

http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv.

# import data
balt.towing <- read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv', stringsAsFactors = F)

a. (4 points)

Describe the data set. What does each row / column represent?

# descriptive information about data
str(balt.towing)
## 'data.frame':    30263 obs. of  5 variables:
##  $ vehicleType      : chr  "Van" "Car" "Car" "Car" ...
##  $ vehicleMake      : chr  "LEXUS" "Mercedes" "Chysler" "Chevrolet" ...
##  $ vehicleModel     : chr  "" "" "Cirrus" "Cavalier" ...
##  $ receivingDateTime: chr  "10/24/2010 12:41:00 PM" "04/28/2015 09:27:00 AM" "07/23/2015 07:55:00 AM" "10/23/2010 11:35:00 AM" ...
##  $ totalPaid        : chr  "$322.00" "$130.00" "$280.00" "$1057.00" ...

The dataset is a collection of vehicles that were towed in Baltimore. Each row is an observation (a vehicle that was towed), and each column is a variable (a piece of information about the vehicle that was towed). There are 5 variables providing information about each tow: the vehicle type (a character string describing the vehicle generally - car, van, truck…), the make (character string), the model (character string), the receiving time (character string containing the month, day, year, hour, minute, second, and AM/PM), and the amount paid for the tow (character string containing dollars and cents). There are 30263 observations.

b. (4 points)

Compute how many vehicles are towed each month.

# create variable for month
balt.towing$month <- as.numeric(substr(balt.towing$receivingDateTime,1,2))

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(knitr)

# count by month
kable(balt.towing %>% group_by(month) %>% count() )
month n
1 2259
2 2574
3 2564
4 2390
5 2392
6 2617
7 2626
8 2693
9 2494
10 2805
11 2505
12 2344

c. (4 points)

How many vehicles with the vehicleType of SUV were towed (have a receivingDateTime) between 10 PM and 7 AM?

# check for labelling inconsistencies regarding "SUV"

# create variable for hour
balt.towing$hour <- as.numeric(substr(balt.towing$receivingDateTime,12,13))
# create variable for AM/PM
balt.towing$time <- substr(balt.towing$receivingDateTime,21,22)
# convert hour of day to 24h time
balt.towing$hour[balt.towing$time == 'PM' & balt.towing$hour < 12] <- balt.towing$hour[balt.towing$time == 'PM' & balt.towing$hour < 12] + 12
# make 12AM = 0h
balt.towing$hour[balt.towing$time == 'AM' & balt.towing$hour == 12] <- balt.towing$hour[balt.towing$time == 'AM' & balt.towing$hour == 12] - 12

balt.towing %>% filter(hour >=22 | hour < 7) %>% filter(vehicleType == "SUV") %>% count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  1463

d. (4 points)

Compute the average totalPaid for vehicleType of Van by year. Create a table to summarize these results.

# create variable for dollars
balt.towing$dollars <- as.numeric(substr(balt.towing$totalPaid,2,nchar(balt.towing$totalPaid)))

# create variable for year
balt.towing$year <- as.numeric(substr(balt.towing$receivingDateTime,7,10))

# use dplyr to filter (subset) type to just vans, then group by year, then summarize by mean of dollars
kable(balt.towing %>% filter(vehicleType == 'Van') %>% group_by(year) %>% summarize(mean.dollars = round(mean(dollars), digits=2)))
year mean.dollars
2010 245.67
2011 332.00
2012 348.93
2013 327.67
2014 271.00
2015 293.73
2016 295.58
2017 308.14