--- title: "Lab 4 - Part 2: Key" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(warning = FALSE) library(datasets) library(ggplot2) library(dplyr) ``` This lab is an in-class demo for 536 students. It should be turned in along with Lab 4. #### Recursive Bayesian Modeling - The classic example of a state-space model is concerned with tracking the position of an object that is observed with errors. The same framework can be used in several other cases, such as the price of avocados that evolve in time. ##### Sampling Model - To illustrate recursive Bayesian analysis and build the necessary skills for state-space modeling, we will start with a situation where the object is not moving in time. The book presents the idea that "you were lost at sea, on a small island, and $\theta$ was your unknown position (univariate, say distance from the coast)." The observations are then modeled as: $$y_t = \theta + \epsilon_t, \; \; \; \epsilon_t \sim N(0,\sigma^2) \; \text{and $\epsilon_t$ are iid.}$$ in other words $$p(y|\theta) \sim N(\theta,\sigma^2) $$ This model is known as the **sampling model**. ##### Prior Distribution - To estimate the position of the object in a Bayesian framework, it is necessary to start with a prior belief about the location of $\theta$. This is constructed as a distribution $$p(\theta) \sim N(\mu_0, C_0),$$ which is called a **prior distribution**. ##### Posterior Distribution - The final step in any Bayesian analysis is calculating, or sampling from, the **posterior distribution.** The posterior distribution, $p(\theta|y)$ can be thought of an update to the prior belief after observing the data ($y$). - In this setting with a normal sampling model and a normal prior distribution, the posterior is also from the normal family. - In particular, the posterior distribution can be calculated in a recursive framework after observing each successive data point using the equation 1.3 in the DLM text, where the posterior $p(\theta|y_1, \dots, y_n) \sim N(\mu_n, C_n).$ \begin{eqnarray*} \mu_n &=& \frac{C_{n-1}}{C_{n-1} + \sigma^2} y_n + \left(1 - \frac{C_{n-1}}{C_{n-1}+\sigma^2}\right) \mu_{n-1}\\ &=& \mu_{n-1} + \frac{C_{n-1}}{C_{n-1} + \sigma^2}(y_n - \mu_{n-1}) \end{eqnarray*} and \begin{eqnarray*} C_n &=& \left(\frac{1}{\sigma^2} + \frac{1}{C_{n-1}}\right)^{-1} \\ &=& \frac{\sigma^2 C_{n-1}}{\sigma^2 + C_{n-1}} \end{eqnarray*} ##### Demonstration 1. Start with prior distribution, say $\theta \sim N(\mu_0 = 0,C_0 =100)$. ```{r} set.seed(09212018) library(dplyr) library(ggplot2) mu0 = 0 C0 = 100 x <- seq(-25,25, by=1) prior.dist <- data.frame(x=x, p.x = dnorm(x,mean=mu0,sd=sqrt(C0))) fig <- ggplot(data=prior.dist, aes(x=x,y=p.x)) + geom_line() + ggtitle('Prior Distribution') fig ``` 2. Simulate one response from the sampling model, $p(y|\theta)$. Let $\theta = 10$ and $\sigma^2 = 4$. ```{r} theta.true = 10 sigma.sq.true = 4 y1 <- rnorm(1, mean=theta.true, sd = sqrt(sigma.sq.true)) fig <- fig + annotate("text", x = y1, y = 0, label = "1", color = 'dodgerblue4') + ggtitle('Prior Distribution and first observed point') fig ``` 3. Update posterior distribution after one point ```{r} mu1 <- mu0 + (C0 / (C0 + sigma.sq.true)) * (y1 - mu0) C1 <- (sigma.sq.true * C0) / (sigma.sq.true + C0) post.dist1 <- data.frame(x=x, p.x = dnorm(x,mean=mu1,sd=sqrt(C1))) fig <- fig + geom_line(data=post.dist1, aes(x=x, y=p.x), color='red') + ggtitle('Updated posterior distribution') + labs(caption='prior distribution in black, posterior after one point in red') fig ``` 4. Now continue this process with 2 more data points. Note that the posterior $p(\theta|y_1)$ serves as a prior for the second data point; hence, the idea of recursive updating.