--- title: "STAT 491 - Lecture 8" output: pdf_document --- ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set(echo = TRUE) ``` # Normal Distribution Recall the the pdf of the normal distribution can be written as: $$p(y|\mu,\sigma^2) =$$ \vfill Now assume we are interested in modeling a continuous quantity, in Lab 6 we will look at housing prices in the Seattle area, using the normal distribution as a sampling model. - What are the parameters that we want to estimate in this setting? \vfill - What parameters require prior distributions? \vfill - What are reasonable prior distributions for these parameters? \vfill ## Estimating the mean and standard deviation of a normal distribution The goal will be to estimate the posterior distribution: $$p(\mu,\sigma^2|\{y_1, \dots, y_n\}) = \frac{p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2)}{\int \int p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2) d\mu d\sigma^2}$$ \vfill - What is $p(\{y_1, \dots, y_n\}|\mu, \sigma^2)?$ \begin{eqnarray*} p(\{y_1, \dots, y_n\}|\mu, \sigma^2) &=&\\ \end{eqnarray*} \vfill \vfill \vfill \newpage - How do we write $p(\mu,\sigma^2)$?: \vfill \vfill - What is $\int \int p(\{y_1, \dots, y_n\}|\mu, \sigma^2)p(\mu,\sigma^2) d\mu \; d\sigma^2$? \vfill \vfill ### Special case with $\sigma^2$ known There is a special case that assumes $\sigma^2$ is known which permits a conjugate prior for $\mu$. It can be shown (DBDA p.452) that $p(\mu) \sim Normal()$ is a conjugate prior in this situation. However, outside of textbook examples, we are never given $\sigma^2$. \vfill ### Joint Priors Commonly independent priors are placed on $\mu$ and $\sigma^2$ such that $p(\mu,\sigma^2) = p(\mu) \times p(\sigma^2)$. - $p(\mu) \sim$ \vfill \vfill - what impact do you suppose \vfill \vfill - what impact do you suppose \vfill \vfill - $p(\sigma^2) \sim$ \vfill \vfill \vfill \vfill \newpage ### JAGS Recall in the beta-binomial setting we discussed five steps for fitting Bayesian models in JAGS: 1. Load data and store as list object. 2. Specify the model as a text variable. 3. Initialize chains by specifying a starting point. 4. Generate MCMC chains. 5. Examine and summarize results. #### 1. Load data First we will simulate data to use for this example, this is useful as we know the actual $\mu$ and $\sigma^2$ values. ```{r} set.seed(02162018) num.obs <- 50 true.mu <- 0 true.sigmasq <- 1 y <- rnorm(num.obs, mean = true.mu, sd = sqrt(true.sigmasq)) ``` We specify our priors at this point too. ```{r} M <- 0 S <- 100 C <- 100000 ``` This data will be stored as a list for use in JAGS. ```{r} dataList = list(y = y, Ntotal = num.obs, M = M, S = S, C = C) ``` #### 2. Specify Model One thing to note is that JAGS uses the precision as the second term in a normal density function. ```{r, eval=F} modelString = "model { for ( i in 1:Ntotal ) { y[i] ~ dnorm(mu, 1/sigma^2) # sampling model } mu ~ dnorm(M,1/S^2) sigma ~ dunif(0,C) } " writeLines( modelString, con='NORMmodel.txt') ``` #### 3. Specify Starting Point In complicated models, using starting points can be very important. Consider a random starting point ```{r} initsList <- function(){ # function for initializing starting place of theta # RETURNS: list with random start point for theta return(list(mu = rnorm(1, mean = 0, sd = 100), sigma = runif(1,0,1000))) } ``` \newpage #### 4. Generate MCMC chains Now, finally we can start running the MCMC chains. First the chains are initialized and a burnin period is created. ```{r, message=F} library(rjags) library(runjags) jagsModel <- jags.model( file = "NORMmodel.txt", data = dataList, inits =initsList, n.chains = 2, n.adapt = 100) update(jagsModel, n.iter = 500) ``` Now the chains are run for a longer period of time - think of the traveling politician. ```{r} num.mcmc <- 1000 codaSamples <- coda.samples( jagsModel, variable.names = c('mu', 'sigma'), n.iter = num.mcmc) ``` #### 5. Evaluate Model Now we can use the coda package to visualize the results from the MCMC. Note codaSamples is a list that contains an entry for each seperate chain. Within the entry there is a column for each variable $\mu$ and $\sigma$. ```{r, fig.align='center',fig.width=5} par(mfcol=c(1,2)) traceplot(codaSamples) ``` ```{r, fig.align='center',fig.width=5} par(mfcol=c(1,2)) densplot(codaSamples) ``` `coda` also contains the ability to summarize the **marginal distributions of each parameter**, using intervals. ```{r} HPDinterval(codaSamples) summary(codaSamples) ``` \newpage ### Summarizing a distribution with posterior samples Now we have a collection of posterior samples to represent the joint posterior distribution $p(\mu, \sigma^2|\{y_1, \dots, y_n\})$ as well as the marginal posterior distributions $p(\mu|\{y_1, \dots, y_n\})$ and $p(\sigma^2|\{y_1, \dots, y_n\})$. - How should we summarize these results? \vfill \vfill - Suppose that a collaborator wanted to know whether the mean of $y$ is greater than -.2? -How would we answer this from a classical standpoint? \vfill ```{r} t.test(y, mu = -.2, alternative = 'greater') ``` - The interpretation here is that \vfill \vfill - We could also use a confidence interval, the 95% one-sided confidence interval for $\mu$ would be (`r t.test(y, mu = -.2, alternative = 'greater')$conf.int[1]`, $\infty$). What is the interpretation here? \vfill \vfill - As a Bayesian, we can actually construct an interval that has probabilistic interpretation. For instance there is a 95% probability that $\mu$ is in the following interval: (`r HPDinterval(codaSamples)[[1]][1,]`). - Furthermore, we can also use our posterior samples to calculate $Pr(\mu > -0.2) =$ `r mean(codaSamples[[1]][,'mu'] > -0.2)`. \newpage ```{r, fig.align='center', fig.width=5, echo=F} par(mfcol=c(1,1)) prob.greater <- mean(codaSamples[[1]][,'mu'] > -0.2) hist(codaSamples[[1]][,'mu'], main=expression(paste('p(',theta,'|y)')), prob=T, xlab=expression(theta),breaks='FD',xlim = c(-.6,.6), ylim=c(0,2.75)) abline(v=-0.2,lwd=2, col='red') arrows(x0 = -.4, y0 = 1.4 , x1 = -0.3, y1 = 0.1 , col='gray', lwd = 2) text(x=-.4, y=2, paste('Prob in this area \n is ', round(1-prob.greater,3))) ``` ### Posterior Predictive Distribution Another valuable tool in Bayesian statistics is the posterior predictive distribution. - The posterior predictive distribution can be written as: \vfill \vfil' where $y^{*}$ is interpreted as a new observation and $p(\theta|y)$ is the posterior for the parameter $\theta$ given that data $y$ have been observed. - The posterior predictive distribution allows us to test whether our sampling model and observed data are reasonable. We will talk more about this later. \vfill - The posterior predictive distribution can also be used to make probabilistic statements about the next response, rather than the group mean. \vfill - When $p(\theta|y)$ does not have a standard form \vfill \vfill \newpage Here is an example ```{r} posterior.mu <- codaSamples[[1]][,'mu'] posterior.sigma <- codaSamples[[1]][,'sigma'] posterior.pred <- rnorm(num.mcmc, mean = posterior.mu, sd = posterior.sigma) prob.greater <- mean(posterior.pred > -0.2) hist(posterior.pred, main='p(y*|y)', prob=T, xlab=expression(theta),breaks='FD', sub='notice different scale') abline(v=-0.2,lwd=2, col='red') arrows(x0 = -1.5, y0 = 0.3 , x1 = -1, y1 = 0.1 , col='gray', lwd = 2) text(x=-1.5, y=.375, paste('Prob in this area \n is ', round(1-prob.greater,3))) ``` \vfill