--- title: "Lab 11: Clustering" author: 'Group Member Names - here' output: html_document --- Turn in one copy for each group. If group members are not present in class they will be required to complete their own lab to receive credit. Please turn in your output as a DOC or PDF file and your R Markdown file. #### 1. Seattle Housing (50 points). Use a clustering algorithm to create clusters of houses in the Seattle area: [http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv](http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv). Think carefully about the variables you include in the algorithm and the format of those variables (hint: would the numerical difference between zipcode be indicative of differences in houses?) Describe your approach in detail including which variables you retained and create a figure showing your clusters of data. #### 2. Titanic (50 points). Use a clustering algorithm to create clusters of passengers using the titanic data. Use the code below to create a subset of the titanic dataset with the following variables: - Survived: whether the passenger survived the trip - Pclass: passenger class - 1, 2, or 3 - Name: passenger name - Sex: gender, male of female - Age: age in years - SibSp: number of siblings / spouses aboard the titanic - Parch: number of parents / children aboard the titanic ```{r} library(dplyr) titanic <- read.csv( 'http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/titanic.csv') set.seed(11142017) titanic <- titanic %>% filter(!is.na(Age)) %>% select(Survived, Pclass, Name, Sex, Age, SibSp, Parch) ```