---
title: "STAT 491 - Lecture 12: Bayesian Regression, Part 2"
output: pdf_document
---

```{r setup, include=FALSE}
set.seed(03292018)
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(rvest)
library(magrittr)
library(stringr)
library(dplyr)
library(rjags)
library(runjags)
```

### Reporting a Bayesian analysis
- This will be important for project summaries, see Chapter 25.2 for additional details.

#### Essential Elements
1. Motivate 
\vfill
2. Clearly 
\vfill
3. Clearly 
\vfill
4. Report 
\vfill
5. Interpret 
\vfill

#### Optional Elements
1. Robustness
\vfill
2. Posterior 
\vfill


## Regression with Multiple Metric Predictors

- The extension to multiple regression is fairly straightforward.
    - The sampling model now becomes 
    \vfill
    - We need  \vfill
- First fit a model with additive effects using the Seattle housing dataset.
    
```{r}
Seattle <- read.csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv',
                    stringsAsFactors = F)
```    
\newpage 

```{r}
model_string <- "model{
# Likelihood
for (i in 1:n){
  Y[i] ~ dnorm(mu[i], 1/sigma^2)
  mu[i] <- beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i]
}
# prior for beta
for (j in 1:4){
  beta[j] ~ dnorm(M,1 / S^2)
}
# Prior for Sigma
sigma ~ dunif(0,C)
}"
```

##### Compile the Model
```{r}
model <- jags.model(textConnection(model_string), data=list(Y=Seattle$price,
        n=nrow(Seattle), x1=Seattle$sqft_living, x2=Seattle$bedrooms,
        x3=Seattle$bathrooms, M=0, S=200000, C=1000000), n.chains =2, n.adapt = 10000)
```

##### Draw Samples
```{r}
update(model, 10000)
samples <- coda.samples(model, variable.names = c('beta','sigma'), n.iter = 50000)
summary(samples)
```

##### Compare with least squares
```{r}
summary(lm(price~sqft_living + bedrooms + bathrooms, data= Seattle))
```


### Multiplicative Interaction

- In some situations an additive relationship between the predictors is not appropriate and we need to consider a multiplicative interaction.
- This is written as 


We can use the same model statement as before, but define $x_3$ to be the product of bathrooms and square foot of living space.

```{r}
model <- jags.model(textConnection(model_string), data=list(Y=Seattle$price,
        n=nrow(Seattle), x1=Seattle$sqft_living, x2=Seattle$bathrooms,
        x3=Seattle$bathrooms*Seattle$sqft_living, M=0, S=200000, C=1000000), n.chains =2, 
        n.adapt = 50000)
update(model, 10000)
samples <- coda.samples(model, variable.names = c('beta','sigma'), n.iter = 50000)
summary(samples)
```


##### Compare with least squares
```{r}
summary(lm(price~sqft_living * bathrooms, data= Seattle))
```

### Notes About Correlated Predictors
- When using multiple predictor variables, we need to be cautious about highly correlated variables. This can be detected by assessing the correlation between the predictor values.
\vfill
- When two variables are highly correlated, the interpretation of the variables can be muddied in a regression setting. The textbook uses an example of money spent per pupil and an average SAT score. Initially spending more money per student gives the illusion of lower SAT scores; however, the confounding variable is the proportion of students taking the SAT.
\vfill
- Bayesian procedures mitigate some of the problems with highly correlated variables as the the variables can "trade off" during iterations of the sampler.
\vfill
- When using JAGS, standardizing the variables can mitigate some of the computational challenges associated with correlated variables.
\vfill


### Including Nominal Variables
- When including nominal values there we need to remember the identifiability issues.
\vfill
- Previously we talked about dummy variables for nominal data. In JAGS, we actually can include a vector that contains an index for the group number.
\vfill
 - The code below is adapted from the files freely available on the course webpage.
 \vfill
 1. Set up data structure
```{r}
Seattle$zipcode <- as.factor(Seattle$zipcode)
y = as.numeric(Seattle[,'price'])
xNom = as.numeric(as.factor(Seattle[,'zipcode']))
xNomlevels = levels(as.factor(Seattle[,'zipcode']))
xMet = as.numeric(Seattle[,'sqft_living'])
Ntotal = length(y)
NxNomLvl = length(unique(xNom))
# Compute scale properties of data, for passing into prior to make the prior
# vague on the scale of the data. 
lmInfo = lm(Seattle[,'price'] ~ Seattle[,'sqft_living'] + Seattle[,'zipcode'] )
residSD = sqrt(mean(lmInfo$residuals^2)) # residual root mean squared deviation

dataList = list(
  y = y ,
  xNom = xNom ,
  xMet = xMet ,
  xMetMean = mean(xMet) ,
  Ntotal = Ntotal ,
  NxNomLvl = NxNomLvl ,
  # data properties for scaling the prior:
  xMetSD = sd(xMet) ,
  yMean = mean(y) ,
  ySD = sd(y) ,
  residSD = residSD 
)
```

2. Specify Model Statement
```{r}
modelstring = "
  model {
    for ( i in 1:Ntotal ) { # sampling model
      y[i] ~ dnorm( mu[i] , 1/ySigma^2 )
      mu[i] <- a0 + a[xNom[i]] + aMet*( xMet[i] - xMetMean )
    }

    # priors
    ySigma ~ dunif( residSD/100 , ySD*10 ) 
    a0 ~ dnorm( yMean , 1/(ySD*5)^2 )
    for ( j in 1:NxNomLvl ) {
      a[j] ~ dnorm( 0.0 , 1/aSigma^2 ) 
    }
    aSigma ~ dgamma( .001 , .001 ) 
    aMet ~ dnorm( 0 , 1/(2*ySD/xMetSD)^2 ) 

    # Convert a0,a[] to sum-to-zero b0,b[] :
    b0 <- a0 + mean( a[1:NxNomLvl] ) + aMet*(-xMetMean)
    for ( j in 1:NxNomLvl ) { 
      b[j] <- a[j] - mean( a[1:NxNomLvl] ) 
    }
  }
  "
  writeLines(modelstring,con="TEMPmodel.txt")
```

3. Choose initialization values based on linear model fit.
```{r}
initsList = list(
    a0 = unname(mean(y) - lmInfo$coefficients["Seattle[, \"sqft_living\"]"] *
                  mean(Seattle[,'sqft_living'])) ,
    a = unname( c(
      lmInfo$coef["(Intercept)"] 
      + mean(Seattle[,'sqft_living']) * lmInfo$coef["Seattle[, \"sqft_living\"]"] ,
      lmInfo$coef["(Intercept)"] 
      + mean(Seattle[,'sqft_living']) * lmInfo$coef["Seattle[, \"sqft_living\"]"] 
      + lmInfo$coef[grep( "zipcode" , names(lmInfo$coef) )] ) ) - mean(y) ,
    aSigma = sd( c(
       lmInfo$coef["(Intercept)"] 
       + mean(Seattle[,'sqft_living']) * lmInfo$coef["Seattle[, \"sqft_living\"]"] ,
       lmInfo$coef["(Intercept)"] 
       + mean(Seattle[,'sqft_living']) * lmInfo$coef["Seattle[, \"sqft_living\"]"] 
       + lmInfo$coef[grep( "zipcode" , names(lmInfo$coef) )] ) ) ,
    ySigma = residSD ,
    aMet = unname(lmInfo$coefficients["Seattle[, \"sqft_living\"]"]) 
    # Let JAGS do other parameters automatically...
  )
```
4. Run JAGS Model
```{r}
parameters = c( "b0" ,  "b" , "aMet" , "aSigma" , "ySigma" )
adaptSteps = 500 
burnInSteps = 1000 
runJagsOut <- run.jags( model="TEMPmodel.txt" , 
                          monitor=parameters , 
                          data=dataList ,  
                          inits=initsList , 
                          n.chains=1 ,
                          adapt=adaptSteps ,
                          burnin=burnInSteps , 
                          summarise=FALSE ,
                          plots=FALSE )

codaSamples = as.mcmc.list( runJagsOut )
```

5. JAGS output
```{r}
summary(codaSamples)
```


### Procedure for Expanding Model and Checking Assumptions

- In Bayesian statistics, comparing the data to the posterior predictions is known as a 
\vfill
- Steps to extend a model in JAGS:
    - Carefully 
    \vfill
    - Be 
    \vfill
    - If you are defining your own initial starting values, define initial values for the new parameters and make sure the old parameters still have sensible parameters.
    \vfill
    - Tell 
    \vfill
    - Modify 
    \vfill
    

### Shrinkage of Regression Coefficients / Variable Selection

- In many research settings, there are a large selection of predictors that could plausibly used to model the outcome of interest.
\vfill
- When using many predictors, 
\vfill
- If we are interested in explaining the variation in the predicted variable, we'd like the description of the data to emphasize the predictors that are closely related to variation ithe predicted variable and de-emphasize weak predictors.
\vfill
- One solution is to use a 
\vfill
- This prior is very similar to the idea of a .
\vfill
- Based on the idea that many potential predictors are not useful for modeling the response of interest, we want a way to "select" which variables to use. 
\newpage
- Some researchers favor the regularization approach for model selection. While this may not give actual zeros, the values of unimportant variables are shrunk toward zero.
\vfill
- In many fields, variable selection is a common procedure with the belief that the true regression coefficients should be 0 or large.
\vfill
- There are several different variable selection approaches in classical statistics.
    -  \vfill
    -  \vfill
    -  \vfill
\vfill
- The key to Bayesian variable selection is that each predictor has a regression coefficient and an inclusion indicator.
\vfill
    - The regression coefficient has the same interpretation as previous models.
    \vfill
    - The inclusion indicator is 1 if the predictors is included in the model and 0 otherwise.
    \vfill
- This can be expressed mathematically as: 
\vfill
- The combination of $\delta_j$ values for all j constitutes a model. 
\vfill 
- In a Bayesian framework, priors can be placed the inclusion indicators. This will give a posterior probability of inclusion for each variable. Additional details and JAGS code is available in 18.4 in DBDA.
\vfill
\newpage