Please use D2L to turn in both the PDF or Word output and your R Markdown file in.

For this lab we will again use the Seattle Housing dataset http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv. Assume your goal is to model housing prices in Seattle using a normal distribution.

Q1.

a. (4 pts)

Write out the sampling model for this dataset, define all variables.

The housing prices are assumed to be normally distributed, so,

$p(\{y_1, \dots, y_n\}|\mu, \sigma^2) = \prod_i p(y_i|\mu, \sigma^2)$ where $$p(y_i|\mu, \sigma^2) \sim N(\mu, \sigma^2)$$, $$y_i$$ is the sales price of house $$i$$, $$\mu$$ is the mean sales price of houses in Seattle, and $$\sigma^2$$ is the variance of housing prices in Seattle.

b. (4 pts)

State and justify prior distributions for $$\mu$$ and $$\sigma^2$$. Plot both of these prior distributions.

Let $$\mu \sim N(500,000; 10,000,000,000)$$. This distribution is centered at $500,000, the standard deviation is 100,000. This places most of the prior mass within $$\pm \200,000$$ from the prior mean. set.seed(03012018) M <- 500000 S.sq <- 1e10 S <- sqrt(S.sq) x <- seq(0, 1000000, by = 10) plot(x, dnorm(x, M, S), type='l', xlab = expression(mu), ylab = expression(paste('p(', mu,')')), main= expression(paste('Prior for ', mu))) text(2e5,3e-6, 'Note scale on y-axis will greatly influence plot \n but all prior values are relatively close together', cex=.7) Let $$\sigma \sim Unif(0,1e8)$$ NOTE THIS IS A PRIOR ON $$\sigma$$ not $$\sigma^2$$. A standard deviation of 1,000,000 would imply that most of the homes are $$\pm$$$2,000,000 from the mean. So setting the upper threshold to be 10,000,000 is conservative.

C <- 1e8
x <- seq(0, C, length.out = 10000)
plot(x, dunif(x, min = 0, max = C), type='l', xlab = expression(sigma), ylab = expression(paste('p(', sigma,')')), main= expression(paste('Prior for ', sigma)))

c. (4 pts)

Using the priors above fit the model in JAGS. Include trace plots and summaries of your output.

Seattle <- read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv', stringsAsFactors = F)

## set up data
dataList = list(y = Seattle\$price, Ntotal = nrow(Seattle), M = M, S = S, C = C)

## specify model
modelString = "model {
for ( i in 1:Ntotal ) {
y[i] ~ dnorm(mu, 1/sigma^2) # sampling model
}
mu ~ dnorm(M,1/S^2)
sigma ~ dunif(0,C)
} "
writeLines( modelString, con='NORMmodel.txt')

## Starting points
initsList <- function(){
# function for initializing starting place of theta
# RETURNS: list with random start point for theta
return(list(mu = rnorm(1, mean = M, sd = S), sigma = runif(1,0,C)))
}

## Run JAGS

library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
library(runjags)
jagsModel <- jags.model( file = "NORMmodel.txt", data = dataList,
inits =initsList, n.chains = 3, n.adapt = 10000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 869
##    Unobserved stochastic nodes: 2
##    Total graph size: 882
##
## Initializing model
update(jagsModel, n.iter = 10000)
num.mcmc <- 10000
codaSamples <- coda.samples( jagsModel, variable.names = c('mu', 'sigma'), n.iter = num.mcmc)

par(mfcol=c(1,2))
traceplot(codaSamples)