Turn in one copy for each group, both as a word or PDF document and the R Markdown source file. This is due by noon on Sunday, January 21.

Lab Overview

For this lab, you will be exploring a data set containing housing sales in King County, Washington (the greater Seattle area). The intent of this lab is to get a feel for some of the basic features in R and explore this data set.

The entire lab will be worth 100 points. Please consider clarity of code and thoughtful writing with an emphasis on concise interpretations as each will be considered when grading labs.


Answer the following questions in this R Markdown document. Please include code where necessary.

1. Factors driving housing prices.

Download the Seattle Housing dataset, available at: http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv.

#read.csv( )

a. (5 points)

What format are the following vectors in the housing dataset: price, bedrooms, bathrooms?

b. (15 points)

Select a few features in the data set that youthink are relevant for determining housing prices. How might each of these influence housing prices?

c. (15 points)

Create two figures with at least one showing the relationship between a two variable in the data set with the housing price.


d. (15 points)

Summarize the take away points from your figures. These summaries should be 3-4 sentences and provide all of the context for your graphics so that an outside observer could understand the story you are illustrating.

e. (20 points)

Choose a variable or set of variables and create a subset of homes from the entire the dataset. For example, consider homes with greater than 3 bedrooms. Then describe the differences between your selected subset of homes and the entire data set. You can do this with numerical summaries, graphical displays, and/or qualitative descriptions.

2. Modeling Housing Prices

a. (15 points)

Based on what you have found in this data set, how might you model housing prices (\(Y_{price} = ?\))? Note, I am not asking you to fit a model, but rather describe important relationships between the variables and housing prices. You may discuss statistical modeling techniques, but we will cover these later in the course.

b. (15 points)

Suppose you have developed a model to predict housing prices in the King County area, how could these results be applied to the Bozeman housing market?