Please use D2L to turn in both the PDF or Word output and your R Markdown file in.

For this lab we will again use the Seattle Housing dataset http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv. Assume your goal is to model housing prices in Seattle using a normal distribution.

Q1.

a. (4 pts)

Write out the sampling model for this dataset, define all variables.

The housing prices are assumed to be normally distributed, so,

\[p(\{y_1, \dots, y_n\}|\mu, \sigma^2) = \prod_i p(y_i|\mu, \sigma^2)\] where \(p(y_i|\mu, \sigma^2) \sim N(\mu, \sigma^2)\), \(y_i\) is the sales price of house \(i\), \(\mu\) is the mean sales price of houses in Seattle, and \(\sigma^2\) is the variance of housing prices in Seattle.

b. (4 pts)

State and justify prior distributions for \(\mu\) and \(\sigma^2\). Plot both of these prior distributions.

Let \(\mu \sim N(500,000; 10,000,000,000)\). This distribution is centered at $500,000, the standard deviation is 100,000. This places most of the prior mass within \(\pm \$200,000\) from the prior mean.

set.seed(03012018)
M <- 500000
S.sq <- 1e10
S <- sqrt(S.sq)
x <- seq(0, 1000000, by = 10)
plot(x, dnorm(x, M, S), type='l', xlab = expression(mu), ylab = expression(paste('p(', mu,')')), main= expression(paste('Prior for ', mu)))
text(2e5,3e-6, 'Note scale on y-axis will greatly influence plot \n but all prior values are relatively close together', cex=.7)

Let \(\sigma \sim Unif(0,1e8)\) NOTE THIS IS A PRIOR ON \(\sigma\) not \(\sigma^2\). A standard deviation of 1,000,000 would imply that most of the homes are \(\pm\) $2,000,000 from the mean. So setting the upper threshold to be 10,000,000 is conservative.

C <- 1e8
x <- seq(0, C, length.out = 10000)
plot(x, dunif(x, min = 0, max = C), type='l', xlab = expression(sigma), ylab = expression(paste('p(', sigma,')')), main= expression(paste('Prior for ', sigma)))

c. (4 pts)

Using the priors above fit the model in JAGS. Include trace plots and summaries of your output.

Seattle <- read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv', stringsAsFactors = F)

## set up data
dataList = list(y = Seattle$price, Ntotal = nrow(Seattle), M = M, S = S, C = C)

## specify model
modelString = "model {
  for ( i in 1:Ntotal ) {
    y[i] ~ dnorm(mu, 1/sigma^2) # sampling model
  }
  mu ~ dnorm(M,1/S^2)
  sigma ~ dunif(0,C)
} "
writeLines( modelString, con='NORMmodel.txt')

## Starting points
initsList <- function(){
  # function for initializing starting place of theta
  # RETURNS: list with random start point for theta
  return(list(mu = rnorm(1, mean = M, sd = S), sigma = runif(1,0,C)))
}

## Run JAGS

library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
library(runjags)
jagsModel <- jags.model( file = "NORMmodel.txt", data = dataList, 
                         inits =initsList, n.chains = 3, n.adapt = 10000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 869
##    Unobserved stochastic nodes: 2
##    Total graph size: 882
## 
## Initializing model
update(jagsModel, n.iter = 10000)
num.mcmc <- 10000
codaSamples <- coda.samples( jagsModel, variable.names = c('mu', 'sigma'), n.iter = num.mcmc)

par(mfcol=c(1,2))
traceplot(codaSamples)