---
title: "STAT 491 - Lecture 11: Bayesian Regression"
output: pdf_document
---

```{r setup, include=FALSE}
set.seed(03292018)
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(rvest)
library(magrittr)
library(stringr)
library(dplyr)
library(rjags)
```


## Generalized Linear Models
Generalized linear models (GLMs) are a class of techniques that include linear regression, logistic regression, and Poisson regression. There are three defining features for a GLM:

1. A 
\vfill
2. A 
\vfill
3. A 
\vfill

### Variable types and sampling models

In a GLM setting there are two sets of variables:

1. *Predicted Variable:* 
\vfill
2. *Predictor Variable:* 
\vfill
- When constructing a GLM, we need to think about the sampling model, $p(y|X,\theta)$. The sampling model is now conditional on not only the unknown parameters, $\theta$, but also the set of predictor variables, $X$. 
\vfill
- Scale type of variables. The structure of the sampling model will be largely dependent on the measurement scale of the predicted variable. Consider a set of athletes participating in a mountain bike race. The results could be reported as a time for completing the race, place in the race, or team participating on. These three different results are examples of metric, ordinal, and nominal scales. 
\vfill
    - metric: 
\vfill
    - ordinal:
\vfill
    - nominal: 
\vfill
- The variable type of the predicted variable will ultimately have a sampling distribution, but we will talk about this later.

\newpage

### Linear Combinations of Predictors
The main element of linear modeling is a relationship between predictors and the predicted variable.
```{r}
Seattle <- read.csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv',
                    stringsAsFactors = F)
```

Let the predicted, or dependent variable, be the price of the home sale denoted as $y$.

#### Metric Predictors
- *Linear function of a single metric predictor*: Initially consider a single metric variable as the predictor, say square footage of the home. 
    - A linear relationship is
\vfill
    - The general mathematical form for a linear function is
    \vfill
    - If we plot values y and x we should see a straight line.
\vfill
```{r, fig.align='center', fig.width=6, fig.height = 4, echo=F}
ggplot(aes(y=price, x=sqft_living), data=Seattle) + geom_point() + geom_smooth(method = 'loess')+geom_smooth(method = 'lm', col='red' ) + ggtitle(label = 'Housing Prices in King County Washington') + ylab('Closing price ($)') + xlab('House size (square feet)')
```
    - In this case, the figure suggests that a linear relationship might not be the best representation of the data. So what do we do??
\vfill

\newpage
- *Additive combination of metric predictors:* often multiple predictors can be used for prediction. Here consider also using bedrooms and bathrooms as predictor variables. Note we can also use a squared term for House size in this same framework. \vfill
    - Consider predicting, y, with with a generic set of predictors, x, according to 
    \vfill
    - An assumption of this model is that the predictors have an additive relationship, meaning that an increase in one predictor variable will result in the same increase in the predicted variable for any value of the other parameters.
    \vfill
    - Non linear effects can be modeled using this framework with a polynomial basis function. For instance consider 
    \vfill
- *Nonadditive combination of metric predictors:* The combined influence of two parameters does not have to be additive. There can be an interaction effect between two parameters. For instance, consider zipcode and square footage relative to price. The additive relationship assumes that each additional foot of space in the house has the same increase in house price for each zipcode. \vfill
    - A multiplicative combination of predictors captures *interactions* between terms and can be formulated as
    \vfill
    - Higher level interactions will add many parameters to the model and can make interpreting the results difficult.

\newpage

![Multiplicative Interaction](images/interact.png)

#### Nominal Predictors
Similarly we can consider nominal predictors in a linear model.
\vfill
- *Linear model for a single nominal predictor:* In this case the nominal predictor has a step or deflection on the predicted variable.  \vfill
    - Let the levels of the nominal variable be denoted as $\tilde{x} = \{x_{[1]}, \dots, x_{[L]}\}$. 
    \vfill
    - Then 
    \vfill
    \newpage
    - Using nominal values in a linear model requires a constraint on the values. Two common options are to set $\beta_0=$, then the $\beta_{[i]}$ values represent group means, say for a zipcode. Another option would be to constrain $\sum_j=1^L \beta_{[L]} = 0$.
    \vfill
- *Additive and Nonadditive combinations of mixed-type predictors:* The same principle applies as before when looking at combinations of nominal predictors as well as combinations of nominal and metric predictors. \vfill


### Linking combined predictors to predicted data
- Given a linear combination of predictors, the final step is to map that relationship to the predicted variable.
\vfill
- Formally, this mapping using a link function. Let $f()$ be an inverse link function, then 
\vfill
 - An inverse link function maps the predictors to the same 'space' or support as the predicted variable. \vfill
 - In a traditional linear model (multiple linear regression), the link function is 
 \vfill
 - When the predicted variable is a binary outcome, the 
 \vfill
 \vfill
- The logit function,
\vfill
- The 
\vfill

- The link function helps to establish the central tendency of the data, but we still need to account for noisy data. In other words, we will use a probability distribution function to map $\mu$ to the predicted data $y$ as $y \sim pdf(\mu, ..[\text{other parameters}])$.

\newpage
#### Formal Expression of the GLM
The GLM can be written as as two equations: 

1. $\mu = f(X\beta)$ and 
\vfill
2. $y \sim pdf(\mu, [\text{parameters}])$
\vfill
\vfill

- A table below shows  some of the models types that we will explore in this class. 
\vfill

 y             $y \sim pdf()$                                   $\mu = f(X\beta)$
-----------   ------------------------------------------------  -----------------------
Metric                                 
Dichotomous     
Count           
------------  ------------------------------------------------  -----------------------
\vfill

## Metric Predicted Variable with One Metric Predictor
- Consider predicting a metric variabel with a single metric predictor. 
\vfill

![Regression Visual](images/regression.png)
\newpage
- Below the code scrapes movie information for the highest earning movie released in March. This dataset contains total earnings and the total number of movies released that month.
\vfill
- With this model, we are using the identity function as the link to fit a simple linear regression model.
\vfill
```{r, fig.align='center'}
#Note this table may update every month
movies <- "http://www.boxofficemojo.com/monthly/"
movie.table <- read_html(movies) %>% html_nodes('table') %>% `[[`(4) %>% html_table() %>% tbl_df()
colnames(movie.table) <- movie.table[1,]
movie.table <- movie.table %>% slice(-1) %>% select(Year, `Total Gross`, Movies, `#1 Movie`)
movie.table$`Total Gross` <- as.numeric(str_replace_all(movie.table$`Total Gross`, "[$,]",''))
movie.table$Movies <- as.numeric(movie.table$Movies)
movie.table
```
\vfill
```{r, fig.align='center', fig.width=6, fig.height=4, echo=F}
ggplot(aes(y =`Total Gross`, x=Movies), data=movie.table) + geom_point() + ylim(0,max(movie.table$`Total Gross`)) + geom_smooth(method='lm') + ggtitle(label ='Total Gross Earnings for #1 Movie Opening in March') + ylab('Total Gross Earnings (million $)') + xlab('Number of Movies Released')
```

\newpage

### Model Formulation for Bayesian Regression

Nothing that we have talked about is inherently Bayesian, but we can formulate this in a similar framework.

1. **Sampling Model:**  \vfill
2. **Priors:** What do we need priors on in this model:
    -  \vfill
    -  \vfill
3. **Posterior**:  \vfill

- Standardizing the data. It can be a good idea to standardize the data before running MCMC code or even regression in a frequentist setting. Standardizing data means re-scaling the data relative to the mean and standard deviation $z_x = \frac{(x - \mu_x)}{\sigma_x}$, where $\mu_x$ is the mean of x and $\sigma_x$ is the standard deviation of x. \vfill
- Recall the t-distribution, this can also be used for a regression setting as the sampling model.

![Robust Regression Visual](images/t_regression.png)
\newpage

#### JAGS
Let $y_i$ be the total gross earnings of movie $i$ and let $x_i$ be the number of movies released in that month. Then use the sampling model
$$y_i|\beta, \sigma^2 \sim Normal(\beta_0 + x_i \beta_1).$$
All variables are centered and scaled. We use the following priors $\beta_j \sim N(0,100^2)$ and $\sigma \sim Unif(0, 100,000 )$.

##### Standardize Variables
```{r}
#Y <- (movie.table$`Total Gross` - mean(movie.table$`Total Gross`)) / sd(movie.table$`Total Gross`)
#x <- (movie.table$Movies - mean(movie.table$Movies)) / sd(movie.table$Movies)
#n <- length(Y)
```


##### Model Specification
```{r}
model_string <- "model{

# Likelihood
for (i in 1:n){
  Y[i] ~ dnorm(mu[i], 1/sigma^2)
  mu[i] <- beta[1] + beta[2]*x[i]
}

# prior for beta
for (j in 1:2){
  beta[j] ~ dnorm(M,1 / S^2)
}

# Prior for Sigma
sigma ~ dunif(0,C)
}"
```

##### Compile the Model
```{r}
model <- jags.model(textConnection(model_string), data=list(Y=movie.table$`Total Gross`,
        n=nrow(movie.table), x=movie.table$Movies, M=0, S=100, C=100000), n.adapt = 1000)
```

\newpage

##### Draw Samples
```{r}
update(model, 1000)
samples <- coda.samples(model, variable.names = c('beta','sigma'), n.iter = 20000)
summary(samples)
#plot(samples)
```

##### Compare with least squares
```{r}
summary(lm(`Total Gross` ~ Movies, data = movie.table))
```
\newpage

## Lab Exercises

##### 1. Explore the Seattle Housing dataset graphically and choose a metric variable to use to model housing prices.

```{r}
Seattle <- read.csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv',
                    stringsAsFactors = F)
str(Seattle)
```

##### 2. Fit a t-distribution regression model for housing price.
Specify the sampling model and all necessary prior distributions.

##### 3. Summarize the results from this model.