Lab 4 - Part 2: Key

This lab is an in-class demo for 536 students. It should be turned in along with Lab 4.

Recursive Bayesian Modeling

The classic example of a state-space model is concerned with tracking the position of an object that is observed with errors. The same framework can be used in several other cases, such as the price of avocados that evolve in time.

Sampling Model

To illustrate recursive Bayesian analysis and build the necessary skills for state-space modeling, we will start with a situation where the object is not moving in time. The book presents the idea that “you were lost at sea, on a small island, and $\theta$ was your unknown position (univariate, say distance from the coast).” The observations are then modeled as: \[y_t = \theta + \epsilon_t, \; \; \; \epsilon_t \sim N(0,\sigma^2) \; \text{and $\epsilon_t$ are iid.}\] in other words \[p(y|\theta) \sim N(\theta,\sigma^2) \] This model is known as the sampling model.

Prior Distribution

To estimate the position of the object in a Bayesian framework, it is necessary to start with a prior belief about the location of $\theta$. This is constructed as a distribution \[p(\theta) \sim N(\mu_0, C_0),\] which is called a prior distribution.

Posterior Distribution

The final step in any Bayesian analysis is calculating, or sampling from, the posterior distribution. The posterior distribution, $p(\theta|y)$ can be thought of an update to the prior belief after observing the data ($y$).
In this setting with a normal sampling model and a normal prior distribution, the posterior is also from the normal family.
In particular, the posterior distribution can be calculated in a recursive framework after observing each successive data point using the equation 1.3 in the DLM text, where the posterior $p(\theta|y_1, \dots, y_n) \sim N(\mu_n, C_n).$ \[\begin{eqnarray*} \mu_n &=& \frac{C_{n-1}}{C_{n-1} + \sigma^2} y_n + \left(1 - \frac{C_{n-1}}{C_{n-1}+\sigma^2}\right) \mu_{n-1}\\ &=& \mu_{n-1} + \frac{C_{n-1}}{C_{n-1} + \sigma^2}(y_n - \mu_{n-1}) \end{eqnarray*}\] and \[\begin{eqnarray*} C_n &=& \left(\frac{1}{\sigma^2} + \frac{1}{C_{n-1}}\right)^{-1} \\ &=& \frac{\sigma^2 C_{n-1}}{\sigma^2 + C_{n-1}} \end{eqnarray*}\]

Demonstration

Start with prior distribution, say $\theta \sim N(\mu_0 = 0,C_0 =100)$.

set.seed(09212018)
library(dplyr)
library(ggplot2)
mu0 = 0
C0 = 100
x <- seq(-25,25, by=1)
prior.dist <- data.frame(x=x, p.x = dnorm(x,mean=mu0,sd=sqrt(C0)))
fig <- ggplot(data=prior.dist, aes(x=x,y=p.x)) + geom_line() + ggtitle('Prior Distribution')
fig

Simulate one response from the sampling model, $p(y|\theta)$. Let $\theta = 10$ and $\sigma^2 = 4$.

theta.true = 10
sigma.sq.true = 4
y1 <- rnorm(1, mean=theta.true, sd = sqrt(sigma.sq.true))
fig <- fig + annotate("text", x = y1, y = 0, label = "1", color = 'dodgerblue4') + ggtitle('Prior Distribution and first observed point')
fig

Update posterior distribution after one point

mu1 <- mu0 + (C0 / (C0 + sigma.sq.true)) * (y1 - mu0)
C1 <- (sigma.sq.true * C0) / (sigma.sq.true + C0)
post.dist1 <- data.frame(x=x, p.x = dnorm(x,mean=mu1,sd=sqrt(C1)))
fig <- fig + geom_line(data=post.dist1, aes(x=x, y=p.x), color='red') + ggtitle('Updated posterior distribution') + labs(caption='prior distribution in black, posterior after one point in red') 
fig

Now continue this process with 2 more data points. Note that the posterior $p(\theta|y_1)$ serves as a prior for the second data point; hence, the idea of recursive updating.