This lab is an in-class demo for 536 students. It should be turned in along with Lab 4.

Recursive Bayesian Modeling

Sampling Model
  • To illustrate recursive Bayesian analysis and build the necessary skills for state-space modeling, we will start with a situation where the object is not moving in time. The book presents the idea that “you were lost at sea, on a small island, and \(\theta\) was your unknown position (univariate, say distance from the coast).” The observations are then modeled as: \[y_t = \theta + \epsilon_t, \; \; \; \epsilon_t \sim N(0,\sigma^2) \; \text{and $\epsilon_t$ are iid.}\] in other words \[p(y|\theta) \sim N(\theta,\sigma^2) \] This model is known as the sampling model.
Prior Distribution
  • To estimate the position of the object in a Bayesian framework, it is necessary to start with a prior belief about the location of \(\theta\). This is constructed as a distribution \[p(\theta) \sim N(\mu_0, C_0),\] which is called a prior distribution.
Posterior Distribution
  • The final step in any Bayesian analysis is calculating, or sampling from, the posterior distribution. The posterior distribution, \(p(\theta|y)\) can be thought of an update to the prior belief after observing the data (\(y\)).

  • In this setting with a normal sampling model and a normal prior distribution, the posterior is also from the normal family.

  • In particular, the posterior distribution can be calculated in a recursive framework after observing each successive data point using the equation 1.3 in the DLM text, where the posterior \(p(\theta|y_1, \dots, y_n) \sim N(\mu_n, C_n).\) \[\begin{eqnarray*} \mu_n &=& \frac{C_{n-1}}{C_{n-1} + \sigma^2} y_n + \left(1 - \frac{C_{n-1}}{C_{n-1}+\sigma^2}\right) \mu_{n-1}\\ &=& \mu_{n-1} + \frac{C_{n-1}}{C_{n-1} + \sigma^2}(y_n - \mu_{n-1}) \end{eqnarray*}\] and \[\begin{eqnarray*} C_n &=& \left(\frac{1}{\sigma^2} + \frac{1}{C_{n-1}}\right)^{-1} \\ &=& \frac{\sigma^2 C_{n-1}}{\sigma^2 + C_{n-1}} \end{eqnarray*}\]
Demonstration
  1. Start with prior distribution, say \(\theta \sim N(\mu_0 = 0,C_0 =100)\).
set.seed(09212018)
library(dplyr)
library(ggplot2)
mu0 = 0
C0 = 100
x <- seq(-25,25, by=1)
prior.dist <- data.frame(x=x, p.x = dnorm(x,mean=mu0,sd=sqrt(C0)))
fig <- ggplot(data=prior.dist, aes(x=x,y=p.x)) + geom_line() + ggtitle('Prior Distribution')
fig       

  1. Simulate one response from the sampling model, \(p(y|\theta)\). Let \(\theta = 10\) and \(\sigma^2 = 4\).
theta.true = 10
sigma.sq.true = 4
y1 <- rnorm(1, mean=theta.true, sd = sqrt(sigma.sq.true))
fig <- fig + annotate("text", x = y1, y = 0, label = "1", color = 'dodgerblue4') + ggtitle('Prior Distribution and first observed point')
fig

  1. Update posterior distribution after one point
mu1 <- mu0 + (C0 / (C0 + sigma.sq.true)) * (y1 - mu0)
C1 <- (sigma.sq.true * C0) / (sigma.sq.true + C0)
post.dist1 <- data.frame(x=x, p.x = dnorm(x,mean=mu1,sd=sqrt(C1)))
fig <- fig + geom_line(data=post.dist1, aes(x=x, y=p.x), color='red') + ggtitle('Updated posterior distribution') + labs(caption='prior distribution in black, posterior after one point in red') 
fig

  1. Now continue this process with 2 more data points. Note that the posterior \(p(\theta|y_1)\) serves as a prior for the second data point; hence, the idea of recursive updating.