---
title: "STAT 491 - Lecture 8"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
```

# Normal Distribution

Recall the the pdf of the normal distribution can be written as:
$$p(y|\mu,\sigma^2) =$$
\vfill
Now assume we are interested in modeling a continuous quantity, in Lab 6 we will look at housing prices in the Seattle area, using the normal distribution as a sampling model.

- What are the parameters that we want to estimate in this setting? 
\vfill
- What parameters require prior distributions? 
\vfill
- What are reasonable prior distributions for these parameters? 
\vfill

## Estimating the mean and standard deviation of a normal distribution
The goal will be to estimate the posterior distribution:
$$p(\mu,\sigma^2|\{y_1, \dots, y_n\}) = \frac{p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2)}{\int \int p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2) d\mu d\sigma^2}$$
\vfill
- What is $p(\{y_1, \dots, y_n\}|\mu, \sigma^2)?$
\begin{eqnarray*}
  p(\{y_1, \dots, y_n\}|\mu, \sigma^2) &=&\\
\end{eqnarray*}
\vfill
\vfill
\vfill
\newpage

- How do we write $p(\mu,\sigma^2)$?:
\vfill
\vfill
- What is $\int \int p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2) d\mu \; d\sigma^2$?
\vfill
\vfill

### Special case with $\sigma^2$ known
There is a special case that assumes $\sigma^2$ is known which permits a conjugate prior for $\mu$. It can be shown (DBDA p.452) that $p(\mu) \sim Normal()$ is a conjugate prior in this situation. However, outside of textbook examples, we are never given $\sigma^2$.
\vfill

### Joint Priors
Commonly independent priors are placed on $\mu$ and $\sigma^2$ such that $p(\mu,\sigma^2) = p(\mu) \times p(\sigma^2)$.

- $p(\mu) \sim$ \vfill
  \vfill
    - what impact do you suppose 
    \vfill
    \vfill
    - what impact do you suppose 
    \vfill
    \vfill
- $p(\sigma^2) \sim$
  \vfill
  \vfill
  \vfill
  \vfill

\newpage
### JAGS
Recall in the beta-binomial setting we discussed five steps for fitting Bayesian models in JAGS:

1. Load data and store as list object.
2. Specify the model as a text variable.
3. Initialize chains by specifying a starting point.
4. Generate MCMC chains.
5. Examine and summarize results.

#### 1. Load data
First we will simulate data to use for this example, this is useful as we know the actual $\mu$ and $\sigma^2$ values.
```{r}
set.seed(02162018)
num.obs <- 50
true.mu <- 0
true.sigmasq <- 1

y <- rnorm(num.obs, mean = true.mu, sd = sqrt(true.sigmasq))
```
We specify our priors at this point too.
```{r}
M <- 0
S <- 100
C <- 100000
```


This data will be stored as a list for use in JAGS.
```{r}
dataList = list(y = y, Ntotal = num.obs, M = M, S = S, C = C)
```

#### 2. Specify Model
One thing to note is that JAGS uses the precision as the second term in a normal density function.
```{r, eval=F}
modelString = "model {
  for ( i in 1:Ntotal ) {
    y[i] ~ dnorm(mu, 1/sigma^2) # sampling model
  }
  mu ~ dnorm(M,1/S^2)
  sigma ~ dunif(0,C)
} "
writeLines( modelString, con='NORMmodel.txt')
```

#### 3. Specify Starting Point
In complicated models, using starting points can be very important. Consider a random starting point
```{r}
initsList <- function(){
  # function for initializing starting place of theta
  # RETURNS: list with random start point for theta
  return(list(mu = rnorm(1, mean = 0, sd = 100), sigma = runif(1,0,1000)))
}
```

\newpage 

#### 4. Generate MCMC chains
Now, finally we can start running the MCMC chains. First the chains are initialized and a burnin period is created.

```{r, message=F}
library(rjags)
library(runjags)
jagsModel <- jags.model( file = "NORMmodel.txt", data = dataList, 
                         inits =initsList, n.chains = 2, n.adapt = 100)
update(jagsModel, n.iter = 500)
```

Now the chains are run for a longer period of time - think of the traveling politician.
```{r}
num.mcmc <- 1000
codaSamples <- coda.samples( jagsModel, variable.names = c('mu', 'sigma'), n.iter = num.mcmc)
```

#### 5. Evaluate Model
Now we can use the coda package to visualize the results from the MCMC. Note codaSamples is a list that contains an entry for each seperate chain. Within the entry there is a column for each variable $\mu$ and $\sigma$.

```{r, fig.align='center',fig.width=5}
par(mfcol=c(1,2))
traceplot(codaSamples)
```

```{r, fig.align='center',fig.width=5}
par(mfcol=c(1,2))
densplot(codaSamples)
```

`coda` also contains the ability to summarize the **marginal distributions of each parameter**, using intervals.

```{r}
HPDinterval(codaSamples)
summary(codaSamples)
```

\newpage

### Summarizing a distribution with posterior samples
Now we have a collection of posterior samples to represent the joint posterior distribution $p(\mu, \sigma^2|\{y_1, \dots, y_n\})$ as well as the marginal posterior distributions $p(\mu|\{y_1, \dots, y_n\})$ and $p(\sigma^2|\{y_1, \dots, y_n\})$.

- How should we summarize these results?
\vfill
\vfill
- Suppose that a collaborator wanted to know whether the mean of $y$ is greater than -.2?
    -How would we answer this from a classical standpoint?
    \vfill
```{r}
t.test(y, mu = -.2, alternative = 'greater')
```
 - The interpretation here is that \vfill
 \vfill
 - We could also use a confidence interval, the 95% one-sided confidence interval for $\mu$ would be (`r t.test(y, mu = -.2, alternative = 'greater')$conf.int[1]`, $\infty$). What is the interpretation here? \vfill \vfill

- As a Bayesian, we can actually construct an interval that has probabilistic interpretation. For instance there is a 95% probability that $\mu$ is in the following interval: (`r HPDinterval(codaSamples)[[1]][1,]`).
- Furthermore, we can also use our posterior samples to calculate $Pr(\mu > -0.2) =$ `r mean(codaSamples[[1]][,'mu'] > -0.2)`.

\newpage
```{r, fig.align='center', fig.width=5, echo=F}
par(mfcol=c(1,1))
prob.greater <- mean(codaSamples[[1]][,'mu'] > -0.2)
hist(codaSamples[[1]][,'mu'], main=expression(paste('p(',theta,'|y)')), prob=T,
     xlab=expression(theta),breaks='FD',xlim = c(-.6,.6), ylim=c(0,2.75))
abline(v=-0.2,lwd=2, col='red')
arrows(x0 = -.4, y0 = 1.4 , x1 = -0.3, y1 = 0.1 , col='gray', lwd = 2)
text(x=-.4, y=2, paste('Prob in this area \n is ', round(1-prob.greater,3)))
```


### Posterior Predictive Distribution

Another valuable tool in Bayesian statistics is the posterior predictive distribution.

- The posterior predictive distribution can be written as:

\vfill
\vfil'

where $y^{*}$ is interpreted as a new observation and $p(\theta|y)$ is the posterior for the parameter $\theta$ given that data $y$ have been observed.

- The posterior predictive distribution allows us to test whether our sampling model and observed data are reasonable. We will talk more about this later.
\vfill
- The posterior predictive distribution can also be used to make probabilistic statements about the next response, rather than the group mean. 
\vfill
- When $p(\theta|y)$ does not have a standard form \vfill
\vfill
\newpage
Here is an example
```{r}
posterior.mu <- codaSamples[[1]][,'mu']
posterior.sigma <- codaSamples[[1]][,'sigma']

posterior.pred <- rnorm(num.mcmc, mean = posterior.mu, sd = posterior.sigma)

prob.greater <- mean(posterior.pred > -0.2)

hist(posterior.pred, main='p(y*|y)', prob=T,
     xlab=expression(theta),breaks='FD', sub='notice different scale')
abline(v=-0.2,lwd=2, col='red')
arrows(x0 = -1.5, y0 = 0.3 , x1 = -1, y1 = 0.1 , col='gray', lwd = 2)
text(x=-1.5, y=.375, paste('Prob in this area \n is ', round(1-prob.greater,3)))
```

\vfill