Exercise: Tidy Data

Q: Is the following data table in a tidy format?

Date Big Sky Base Bridger Bowl Base
Oct 1 0 0
Nov 1 25 12
Dec 1 50 30

If not, organize the table in a tidy format

The dataset

First read in the data set which is available at: http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv.

baltimore.tow <- 
  read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv', 
           stringsAsFactors = F)
str(baltimore.tow)
## 'data.frame':    30263 obs. of  5 variables:
##  $ vehicleType      : chr  "Van" "Car" "Car" "Car" ...
##  $ vehicleMake      : chr  "LEXUS" "Mercedes" "Chysler" "Chevrolet" ...
##  $ vehicleModel     : chr  "" "" "Cirrus" "Cavalier" ...
##  $ receivingDateTime: chr  "10/24/2010 12:41:00 PM" "04/28/2015 09:27:00 AM" "07/23/2015 07:55:00 AM" "10/23/2010 11:35:00 AM" ...
##  $ totalPaid        : chr  "$322.00" "$130.00" "$280.00" "$1057.00" ...

Information for a few vehicles

kable(head(baltimore.tow, 20))
vehicleType vehicleMake vehicleModel receivingDateTime totalPaid
Van LEXUS 10/24/2010 12:41:00 PM $322.00
Car Mercedes 04/28/2015 09:27:00 AM $130.00
Car Chysler Cirrus 07/23/2015 07:55:00 AM $280.00
Car Chevrolet Cavalier 10/23/2010 11:35:00 AM $1057.00
Car Hyundai Tiburon 10/25/2010 02:49:00 PM $469.00
SUV Toyota RAV4 10/25/2010 11:12:00 AM $305.00
Car Bmw 325 10/23/2012 07:50:00 PM $220.00
Car Honda Accord 10/25/2010 02:53:00 PM $327.00
Car Ford Taurus 12/23/2010 04:09:00 AM $290.00
SUV Ford Expo 12/23/2010 02:51:00 PM $230.00
SUV Lincoln Mkx 12/23/2010 01:40:00 PM $230.00
Car Geo Prizm 12/23/2010 06:45:00 AM $570.00
Car Kia Spectra 12/23/2010 03:57:00 AM $280.00
Car Nissan 12/23/2010 05:08:00 AM $280.00
Pick-up Truck Dodge Dakota 12/23/2010 02:05:00 PM $275.00
Motor Cycle (Street Bike) Honda CBR600 12/22/2010 11:09:00 PM $140.00
Car Chrysler Sebring 12/23/2010 11:45:00 AM $220.00
Car Cadillac Deville 12/22/2010 09:39:00 PM $230.00
Car Nissan Maxima 12/22/2010 12:41:00 PM $275.00
Van Dodge Caravan 12/23/2010 12:21:00 AM $140.00

Exercise: Using the substr() function

Use the substr() function to extract year and create a new variable in R.

baltimore.tow <- 
  read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv', 
           stringsAsFactors = F)
# baltimore.tow$Year <- 

strsplit() function

First split on ‘/’:

pieces <- strsplit(as.character(
  baltimore.tow$receivingDateTime[1]), '/')
pieces
## [[1]]
## [1] "10"               "24"               "2010 12:41:00 PM"

strsplit function - Unlist

pieces.list <- strsplit(as.character(
  baltimore.tow$receivingDateTime), '/')

pieces.mat <- matrix(unlist(pieces.list),ncol=3,
      nrow=length(pieces.list), byrow=T)

pieces.mat[1:3,1:3]
##      [,1] [,2] [,3]              
## [1,] "10" "24" "2010 12:41:00 PM"
## [2,] "04" "28" "2015 09:27:00 AM"
## [3,] "07" "23" "2015 07:55:00 AM"

Exercise: strsplit function

Now we can extract year from this chunk of code contained in pieces.mat.

#baltimore.tow$Year <- 

Exercise: Delete Misc. Type Vehicles

First we will delete golf carts, boats, and trailers. There are several ways to do this, consider making a new data frame.

balt.tow.small <-

Exercise: Aggregate

We have used aggregate in the past, how can we compute the total number of vehicles towed by group and time of day with aggregate?

aggregate()

Sketch out code to do this.

Exercise: group_by()

Now also use the group by procedure to compute the average towing cost for all vehicle types.