Lab 11: Clustering

Turn in one copy for each group. If group members are not present in class they will be required to complete their own lab to receive credit. Please turn in your output as a DOC or PDF file and your R Markdown file.

1. Seattle Housing (50 points).

Use a clustering algorithm to create clusters of houses in the Seattle area: http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv. Think carefully about the variables you include in the algorithm and the format of those variables (hint: would the numerical difference between zipcode be indicative of differences in houses?) Describe your approach in detail including which variables you retained and create a figure showing your clusters of data.

2. Titanic (50 points).

Use a clustering algorithm to create clusters of passengers using the titanic data. Use the code below to create a subset of the titanic dataset with the following variables:

Survived: whether the passenger survived the trip
Pclass: passenger class - 1, 2, or 3
Name: passenger name
Sex: gender, male of female
Age: age in years
SibSp: number of siblings / spouses aboard the titanic
Parch: number of parents / children aboard the titanic

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

titanic <- read.csv(
  'http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/titanic.csv')
set.seed(11142017)
titanic <- titanic %>% filter(!is.na(Age)) %>% select(Survived, Pclass, Name, Sex, Age, SibSp, Parch)

Lab 11: Clustering

Group Member Names - here

1. Seattle Housing (50 points).

2. Titanic (50 points).