--- title: "Lecture 5" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` #### Posterior Sampling from Normal Now we seek to create draws from the *joint posterior distribution* $\hspace{4cm}$ and the *marginal posterior distributions* $\hspace{4cm}$ and $\hspace{4cm}.$ Note the marginal posterior distributions would be used to calculate quantities such as $\hspace{4cm}$. Using a Monte Carlo procedure, we can simulate samples from the joint posterior using the following algorithm: 1. Simulate \vfill 2. Simulate \vfill 3. Repeat \vfill Note that each pair $\hspace{4cm}$ is a sample from the joint posterior distibution and that $\{\sigma^2_1, \dots, \sigma^2_m\}$ and $\{\theta_1, \dots, \theta_m\}$ are samples from the respective marginal posterior distributions. The R code for this follows as: ```{r} #### Posterior Sampling with Normal Model set.seed(09182017) # true parameters from normal distribution sigma.sq.true <- 1 theta.true <- 0 # generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true)) # specify terms for priors nu.0 <- 1 sigma.sq.0 <- 10 mu.0 <- 0 kappa.0 <- 1 # compute terms in posterior kappa.n <- kappa.0 + num.obs nu.n <- nu.0 + num.obs s.sq <- var(y) #sum((y - mean(y))^2) / (num.obs - 1) sigma.sq.n <- (1 / nu.n) * (nu.0 * sigma.sq.0 + (num.obs - 1) * s.sq + (kappa.0*num.obs)/kappa.n * (mean(y) - mu.0)^2) mu.n <- (kappa.0 * mu.0 + num.obs * mean(y)) / kappa.n # simulate from posterior #install.packages("LearnBayes") library(LearnBayes) # for rigamma num.sims <- 1000 sigma.sq.sims <- theta.sims <- rep(0,num.sims) for (i in 1:num.sims){ sigma.sq.sims[i] <- rigamma(1,nu.n/2,sigma.sq.n*nu.n/2) theta.sims[i] <- rnorm(1, mu.n, sqrt(sigma.sq.sims[i]/kappa.n)) } ``` \newpage ```{r, fig.align='center', fig.width=5, fig.height=3} ## Plots library(grDevices) # for rgb plot(sigma.sq.sims,theta.sims,pch=16,col=rgb(.1,.1,.8,.05),ylab=expression(theta), xlab=expression(sigma[2]),main='Joint Posterior') points(1,0,pch=14,col='black') par(mfcol=c(1,2)) hist(sigma.sq.sims,prob=T,main=expression('Marginal Posterior of' ~ sigma[2]), xlab=expression(sigma[2])) abline(v=1,col='red',lwd=2) hist(theta.sims,prob=T,main=expression('Marginal Posterior of' ~ theta),xlab=expression(theta)) abline(v=0,col='red',lwd=2) ``` It is important to note that the prior structure is very specific in this case, where $p(\theta|\sigma^2)$ is a function of $\sigma^2$. In most prior structures this type of conditional sampling scheme is not as easy as this case and we need to use Markov Chain Monte Carlo methods. #### Posterior Sampling with the Gibbs Sampler In the previous section we modeled the uncertainty in $\theta$ as a function of $\sigma^2$, where $p(\theta|\sigma^2) =$. \vfill In some situations this makes sense, but in others the uncertainty in $\theta$ may be specified independently from $\sigma^2$ Mathematically, this translates to $p(\sigma^2,\theta)$ \vfill \vfill \newpage A common \emph{semiconjugate} set of prior distributions is: \begin{eqnarray*} \theta &\sim& \\ \sigma^2 &\sim& \end{eqnarray*} \vfill Now when $Y_1,\dots, Y_n|\theta,\sigma^2\ \sim N(\theta,\sigma^2)$ then $\theta|\sigma^2,y_1, \dots, y_n \sim N(\mu_n,\tau_n^2)$. \begin{equation*} \mu_n = \hspace{6cm} \text{and} \hspace{2cm} \tau_n^2 = \hspace{4cm} \end{equation*} \vfill In the conjugate case where $\tau_0^2$ was proportional to $\sigma^2$, samples from the joint posterior can be taken using the Monte Carlo procedure demonstrated before. However, when $\tau_0^2$ is not proportional to $\sigma^2$ the marginal density of $\sigma^2$ is not an inverse gamma distribution or another named distribution that permits easy sampling. \vfill Suppose that you know the value of $\theta$. Then the conditional distribution of $\sigma^2$ is: \begin{eqnarray*} p(\sigma^2|\theta, y_1, \dots y_n) &\propto& p(y_1,\dots, y_n|\theta,\sigma^2) p(\sigma^2)\\ &\propto& \hspace{12cm}\\ &\propto& \hspace{12cm} \end{eqnarray*} which is the kernel of an inverse gamma distribution. So $\sigma^2|\theta,y_1, \dots, y_n \sim InvGamma(\nu_n/2,\nu_n \sigma_n^2(\theta)/2),$ where $\nu_n = \nu_0 + n$, $\sigma^2_n(\theta) = \frac{1}{\nu_n}[\nu_0\sigma_0^2 + n s^2_n(\theta)]$ and $s_n^2(\theta) = \sum (y_i - \theta)^2/n$ the unbiased estimate of $\sigma^2$ if $\theta$ were known. \vfill Now can we use the full conditional distributions to draw samples from the joint posterior? \vfill Suppose we had $\sigma^{2(1)}$, a single sample from the marginal posterior distribution $p(\sigma^2|y_1, \dots, y_n)$. Then we could sample: \begin{equation*} \theta^{(1)} \sim \hspace{6cm} \end{equation*} and $\{\theta^{(1)}, \sigma^{2(1)}\}$ would be a sample from the joint posterior distribution $p(\theta,\sigma^2|y_1, \dots, y_n)$. Now using $\theta^{(1)}$ we can generate another sample of $\sigma^2$ from \begin{equation*} \sigma^{2(2)} \sim \hspace{6cm} \end{equation*} This sample $\{ \theta^{(1)}, \sigma^{2(2)}\}$ would also be a sample from the joint posterior distribution. This process follows iteratively. However, we don't actually have $\sigma^{2(1)}$. \vfill \newpage #### Gibbs Sampler The distributions $p(\theta|y_1, \dots, y_n, \sigma^2)$ and $p(\sigma^2|y_1, \dots, y_n, \theta)$ are known as the full conditional distributions, that is they condition on all other values and parameters. The Gibbs sampler uses these full conditional distributions and the procedure follows as: 1. sample $\theta^{(i+1)}$ from \vfill 2. sample $\sigma^{2(i+1)}$ from \vfill 3. let \vfill ```{r} ######### First Gibbs Sampler set.seed(09182017) ### simulate data num.obs <- 100 mu.true <- 0 sigmasq.true <- 1 y <- rnorm(num.obs,mu.true,sigmasq.true) mean.y <- mean(y) var.y <- var(y) library(LearnBayes) # for rigamma ### initialize vectors and set starting values and priors num.sims <- 1000 Phi <- matrix(0,nrow=num.sims,ncol=2) Phi[1,1] <- 0 # initialize theta Phi[1,2] <- 1 # initialize (sigmasq) mu.0 <- 0 tausq.0 <- 1 nu.0 <- 1 sigmasq.0 <- 10 for (i in 2:num.sims){ # sample theta from full conditional mu.n <- (mu.0 / tausq.0 + num.obs * mean.y / Phi[(i-1),2]) / (1 / tausq.0 + num.obs / Phi[(i-1),2] ) tausq.n <- 1 / (1/tausq.0 + num.obs / Phi[(i-1),2]) Phi[i,1] <- rnorm(1,mu.n,sqrt(tausq.n)) # sample (1/sigma.sq) from full conditional nu.n <- nu.0 + num.obs sigmasq.n.theta <- 1/nu.n*(nu.0*sigmasq.0 + sum((y - Phi[i,1])^2)) Phi[i,2] <- rigamma(1,nu.n/2,nu.n*sigmasq.n.theta/2) } ``` \newpage ```{r, fig.align='center', fig.width=6, fig.height=6} par(mfcol=c(2,2)) # plot joint posterior plot(Phi[1:5,1],1/Phi[1:5,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=c('1','2','3','4','5'), cex=.8, ylab=expression(sigma[2]), xlab = expression(theta), main='Joint Posterior',sub='first 5 samples') plot(Phi[1:10,1],1/Phi[1:10,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=as.character(1:15), cex=.8,ylab=expression(sigma[2]), xlab = expression(theta), main='Joint Posterior',sub='first 10 samples') plot(Phi[1:100,1],1/Phi[1:100,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=16,col=rgb(0,0,0,1), cex=.8,ylab=expression(sigma[2]), xlab = expression(theta), main='Joint Posterior',sub='first 100 samples') plot(Phi[,1],1/Phi[,2],xlim=range(Phi[,1]),ylim=range(1/Phi[,2]),pch=16,col=rgb(0,0,0,.25), cex=.8, ylab=expression(sigma[2]), xlab = expression(theta), main='Joint Posterior',sub='all samples') points(0,1,pch='X',col='red',cex=2) ``` \newpage ```{r, fig.align='center', fig.width=5.5, fig.height=5.5} par(mfcol=c(2,2)) # plot marginal posterior of theta hist(Phi[,1],xlab=expression(theta),main=expression('Marginal Posterior of ' ~ theta),probability=T) abline(v=mu.true,col='red',lwd=2) # plot marginal posterior of sigmasq hist(Phi[,2],xlab=expression(sigma[2]),main=expression('Marginal Posterior of ' ~ sigma[2]),probability=T) abline(v=sigmasq.true^2,col='red',lwd=2) # plot trace plots plot(Phi[,1],type='l',ylab=expression(theta), main=expression('Trace plot for ' ~ theta)) abline(h=mu.true,lwd=2,col='red') plot(Phi[,2],type='l',ylab=expression(sigma[2]), main=expression('Trace plot for ' ~ sigma[2])) abline(h=sigmasq.true^2,lwd=2,col='red') # compute posterior mean and quantiles colMeans(Phi) apply(Phi,2,quantile,probs=c(.025,.975)) ``` \newpage So what do we do about the starting point? We will see that given a reasonable starting point the algorithm will converge to the true posterior distribution. Hence the first (few) iterations are regarded as the burn-in period and are discarded (as they have not yet reached the true posterior). \vfill #### More on the Gibbs Sampler The algorithm previously detailed is called the \emph{Gibbs Sampler} and generates a dependent sequence of parameters $\{\phi_1, \phi_2, \dots, \phi_n\}$. This is in contrast to the Monte Carlo procedure we previously detailed, including the situation where $p(\theta|\sigma^2) \sim N(\mu_0, \sigma^2 / \kappa_0)$. \vfill The Gibbs Sampler is a basic Markov Chain Monte Carlo (MCMC) algorithm. A Markov chain is a stochastic process where the current state only depends on the previous state. Formally \vfill \vfill Depending on the class interests, we may return to talk more about the theory of MCMC later in the course, but the basic ideas are: \vfill That is the sampling distribution of the draws from the MCMC algorithm approach the desired target distribution (generally a posterior in Bayesian statistics) as the number of samples $j$ goes to infinity. The is not dependent on the starting values of $\phi^{(0)}$, but poor starting values will take longer for convergence. Note this will be more problematic when we consider another MCMC algorithm, the Metropolis-Hastings sampler. \vfill Given the equation above, for most functions $g(.)$: \vfill Thus we can approximate expectations of functions of $\phi$ using the sample average from the MCMC draws, similar to our Monte Carlo procedures presented earlier. \vfill