---
title: "Lecture 5"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#### Posterior Sampling from Normal
 Now we seek to create draws from the *joint posterior distribution* $\hspace{4cm}$ and the *marginal posterior distributions* $\hspace{4cm}$ and $\hspace{4cm}.$  Note the marginal posterior distributions would be used to calculate quantities such as $\hspace{4cm}$.
 
Using a Monte Carlo procedure, we can simulate samples from the joint posterior using the following algorithm:
 
1. Simulate  \vfill
2. Simulate \vfill
3. Repeat \vfill

 Note that each pair $\hspace{4cm}$ is a sample from the joint posterior distibution and that $\{\sigma^2_1, \dots, \sigma^2_m\}$ and $\{\theta_1, \dots, \theta_m\}$ are samples from the respective marginal posterior distributions.


 The R code for this follows as:
```{r}
#### Posterior Sampling with Normal Model
set.seed(09182017)
# true parameters from normal distribution
sigma.sq.true <- 1
theta.true <- 0

# generate data
num.obs <- 100
y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

# specify terms for priors
nu.0 <- 1
sigma.sq.0 <- 10
mu.0 <- 0
kappa.0 <- 1

# compute terms in posterior
kappa.n <- kappa.0 + num.obs
nu.n <- nu.0 + num.obs
s.sq <- var(y) #sum((y - mean(y))^2) / (num.obs - 1)
sigma.sq.n <- (1 / nu.n) * (nu.0 * sigma.sq.0 + (num.obs - 1) * s.sq + 
(kappa.0*num.obs)/kappa.n * (mean(y) - mu.0)^2)
mu.n <- (kappa.0 * mu.0 + num.obs * mean(y)) / kappa.n

# simulate from posterior
#install.packages("LearnBayes")
library(LearnBayes) # for rigamma
num.sims <- 1000
sigma.sq.sims <- theta.sims <- rep(0,num.sims)
for (i in 1:num.sims){
  sigma.sq.sims[i] <- rigamma(1,nu.n/2,sigma.sq.n*nu.n/2)
  theta.sims[i] <- rnorm(1, mu.n, sqrt(sigma.sq.sims[i]/kappa.n))
}
```
\newpage

```{r, fig.align='center', fig.width=5, fig.height=3}
## Plots
library(grDevices) # for rgb
plot(sigma.sq.sims,theta.sims,pch=16,col=rgb(.1,.1,.8,.05),ylab=expression(theta),
     xlab=expression(sigma[2]),main='Joint Posterior')
points(1,0,pch=14,col='black')
par(mfcol=c(1,2))
hist(sigma.sq.sims,prob=T,main=expression('Marginal Posterior of' ~ sigma[2]),
     xlab=expression(sigma[2]))
abline(v=1,col='red',lwd=2)
hist(theta.sims,prob=T,main=expression('Marginal Posterior of' ~ theta),xlab=expression(theta))
abline(v=0,col='red',lwd=2)
```


It is important to note that the prior structure is very specific in this case, where $p(\theta|\sigma^2)$ is a function of $\sigma^2$. In most prior structures this type of conditional sampling scheme is not as easy as this case and we need to use Markov Chain Monte Carlo methods. 


#### Posterior Sampling with the Gibbs Sampler
In the previous section we modeled the uncertainty in $\theta$ as a function of $\sigma^2$, where $p(\theta|\sigma^2) =$. \vfill
In some situations this makes sense, but in others the uncertainty in $\theta$ may be specified independently from $\sigma^2$
Mathematically, this translates to $p(\sigma^2,\theta)$ \vfill
\vfill
\newpage

A common \emph{semiconjugate} set of prior distributions is:
\begin{eqnarray*}
\theta &\sim& \\
\sigma^2 &\sim& 
\end{eqnarray*}
\vfill
Now when $Y_1,\dots, Y_n|\theta,\sigma^2\ \sim N(\theta,\sigma^2)$ then $\theta|\sigma^2,y_1, \dots, y_n \sim N(\mu_n,\tau_n^2)$.

\begin{equation*}
\mu_n =  \hspace{6cm} \text{and} \hspace{2cm} \tau_n^2 = \hspace{4cm}
\end{equation*}
\vfill
In the conjugate case where $\tau_0^2$ was proportional to $\sigma^2$, samples from the joint posterior can be taken using the Monte Carlo procedure demonstrated before. However, when $\tau_0^2$ is not proportional to $\sigma^2$ the marginal density of $\sigma^2$ is not an inverse gamma distribution or another named distribution that permits easy sampling.
\vfill
Suppose that you know the value of $\theta$. Then the conditional distribution of $\sigma^2$  is:
\begin{eqnarray*}
 p(\sigma^2|\theta, y_1, \dots y_n) &\propto& p(y_1,\dots, y_n|\theta,\sigma^2) p(\sigma^2)\\
 &\propto& \hspace{12cm}\\
 &\propto& \hspace{12cm}
\end{eqnarray*}
which is the kernel of an inverse gamma distribution. So $\sigma^2|\theta,y_1, \dots, y_n \sim InvGamma(\nu_n/2,\nu_n \sigma_n^2(\theta)/2),$ where $\nu_n = \nu_0 + n$, $\sigma^2_n(\theta) = \frac{1}{\nu_n}[\nu_0\sigma_0^2 + n s^2_n(\theta)]$ and $s_n^2(\theta) = \sum (y_i - \theta)^2/n$ the unbiased estimate of $\sigma^2$ if $\theta$ were known.
\vfill
Now can we use the full conditional distributions to draw samples from the joint posterior?
\vfill


Suppose we had $\sigma^{2(1)}$, a single sample from the marginal posterior distribution $p(\sigma^2|y_1, \dots, y_n)$. Then we could sample:

\begin{equation*}
\theta^{(1)} \sim \hspace{6cm}
\end{equation*}

and $\{\theta^{(1)}, \sigma^{2(1)}\}$ would be a sample from the joint posterior distribution $p(\theta,\sigma^2|y_1, \dots, y_n)$. Now using $\theta^{(1)}$ we can generate another sample of $\sigma^2$ from

\begin{equation*}
\sigma^{2(2)} \sim \hspace{6cm}
\end{equation*}
This sample $\{ \theta^{(1)}, \sigma^{2(2)}\}$ would also be a sample from the joint posterior distribution. This process follows iteratively. However, we don't actually have $\sigma^{2(1)}$.

\vfill
\newpage

#### Gibbs Sampler
The distributions $p(\theta|y_1, \dots, y_n, \sigma^2)$ and $p(\sigma^2|y_1, \dots, y_n, \theta)$ are known as the full conditional distributions, that is they condition on all other values and parameters. The Gibbs sampler uses these full conditional distributions and the procedure follows as:

1. sample $\theta^{(i+1)}$ from \vfill
2. sample $\sigma^{2(i+1)}$ from  \vfill
3. let  \vfill


```{r}
######### First Gibbs Sampler
set.seed(09182017)
### simulate data
num.obs <- 100
mu.true <- 0
sigmasq.true <- 1
y <- rnorm(num.obs,mu.true,sigmasq.true)
mean.y <- mean(y)
var.y <- var(y)
library(LearnBayes) # for rigamma
### initialize vectors and set starting values and priors
num.sims <- 1000
Phi <- matrix(0,nrow=num.sims,ncol=2)
Phi[1,1] <- 0 # initialize theta
Phi[1,2] <- 1 # initialize (sigmasq)
mu.0 <- 0
tausq.0 <- 1
nu.0 <- 1
sigmasq.0 <- 10
for (i in 2:num.sims){
  # sample theta from full conditional
  mu.n <- (mu.0 / tausq.0 + num.obs * mean.y / Phi[(i-1),2]) / (1 / tausq.0 + num.obs / Phi[(i-1),2] )
  tausq.n <- 1 / (1/tausq.0 + num.obs / Phi[(i-1),2])
  Phi[i,1] <- rnorm(1,mu.n,sqrt(tausq.n))
  
  # sample (1/sigma.sq) from full conditional
  nu.n <- nu.0 + num.obs
  sigmasq.n.theta <- 1/nu.n*(nu.0*sigmasq.0 + sum((y - Phi[i,1])^2))
  Phi[i,2] <- rigamma(1,nu.n/2,nu.n*sigmasq.n.theta/2)
}
```
\newpage

```{r, fig.align='center', fig.width=6, fig.height=6}
par(mfcol=c(2,2))
# plot joint posterior
plot(Phi[1:5,1],1/Phi[1:5,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=c('1','2','3','4','5'),
     cex=.8, ylab=expression(sigma[2]), xlab = expression(theta), 
     main='Joint Posterior',sub='first 5 samples')

plot(Phi[1:10,1],1/Phi[1:10,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=as.character(1:15),
     cex=.8,ylab=expression(sigma[2]), xlab = expression(theta), 
     main='Joint Posterior',sub='first 10 samples')

plot(Phi[1:100,1],1/Phi[1:100,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=16,col=rgb(0,0,0,1),
     cex=.8,ylab=expression(sigma[2]), xlab = expression(theta), 
     main='Joint Posterior',sub='first 100 samples')

plot(Phi[,1],1/Phi[,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=16,col=rgb(0,0,0,.25),
     cex=.8, ylab=expression(sigma[2]), xlab = expression(theta), 
     main='Joint Posterior',sub='all samples')
points(0,1,pch='X',col='red',cex=2)
```

\newpage

```{r, fig.align='center', fig.width=5.5, fig.height=5.5}
par(mfcol=c(2,2))
# plot marginal posterior of theta
hist(Phi[,1],xlab=expression(theta),main=expression('Marginal Posterior of ' ~ theta),probability=T)
abline(v=mu.true,col='red',lwd=2)
# plot marginal posterior of sigmasq
hist(Phi[,2],xlab=expression(sigma[2]),main=expression('Marginal Posterior of ' ~ sigma[2]),probability=T)
abline(v=sigmasq.true^2,col='red',lwd=2)

# plot trace plots
plot(Phi[,1],type='l',ylab=expression(theta), main=expression('Trace plot for ' ~ theta))
abline(h=mu.true,lwd=2,col='red')
plot(Phi[,2],type='l',ylab=expression(sigma[2]), main=expression('Trace plot for ' ~ sigma[2]))
abline(h=sigmasq.true^2,lwd=2,col='red')

# compute posterior mean and quantiles
colMeans(Phi)
apply(Phi,2,quantile,probs=c(.025,.975))
```

 \newpage


 So what do we do about the starting point? We will see that given a reasonable starting point the algorithm will converge to the true posterior distribution. Hence the first (few) iterations are regarded as the burn-in period and are discarded (as they have not yet reached the true posterior).
 
 \vfill
 
#### More on the Gibbs Sampler

The algorithm previously detailed is called the \emph{Gibbs Sampler} and generates a dependent sequence of parameters $\{\phi_1, \phi_2, \dots, \phi_n\}$. This is in contrast to the Monte Carlo procedure we previously detailed, including the situation where $p(\theta|\sigma^2) \sim N(\mu_0, \sigma^2 / \kappa_0)$. \vfill

The Gibbs Sampler is a basic Markov Chain Monte Carlo (MCMC) algorithm. A Markov chain is a stochastic process where the current state only depends on the previous state. Formally 
\vfill
\vfill
Depending on the class interests, we may return to talk more about the theory of MCMC later in the course, but the basic ideas are:

\vfill

That is the sampling distribution of the draws from the MCMC algorithm approach the desired target distribution (generally a posterior in Bayesian statistics) as the number of samples $j$ goes to infinity. The is not dependent on the starting values of $\phi^{(0)}$, but poor starting values will take longer for convergence. Note this will be more problematic when we consider another MCMC algorithm, the Metropolis-Hastings sampler. 
\vfill
Given the equation above, for most functions $g(.)$:

 \vfill
Thus we can approximate expectations of functions of $\phi$ using the sample average from the MCMC draws, similar to our Monte Carlo procedures presented earlier. \vfill