--- title: "STAT 491 - Lecture 5" date: February 1, 2018 output: pdf_document --- ```{r setup, include=FALSE} library(knitr) knitr::opts_chunk$set(echo = TRUE) set.seed(01152018) ``` # Ch.5 Bayes Rule Compare the two probability statements: \vfill \vfill The first probability statement considers two possible outcomes: \vfill The second probability statement incorporates some additional information (data) into the probability statement, \vfill Bayes rule is the mathematical foundation for re-allocating credibility (or probability) when conditioning on data. \vfill ### Bayes Rule and Conditional Probability - Recall: the conditional probability $P(A|B) = \frac{P(A \cap B)}{P(B)}$. From here we do some algebra to obtain Bayes rule. \vfill \vfill - Either of the last two equations are called **Bayes Rule**, named after Thomas Bayes. ### Bayes Rule with two-way discrete table \newpage | | | *COLUMN* | | | | |:---------------------:|-------|:----------:|------|:------:|----------------------| | *ROW* | | c | | **Marginal** | | | | | | | | r | | p(r,c) = p(r\|c)p(c) = p(c\|r)p(r) | | p(r) | | | | | | | | **Marginal** | | p(c) | | | \vfill Recall the following two-way table: | | | Hair Color | | | | |:---------------------:|-------|:----------:|------|:------:|----------------------| | Eye Color | Black | Brunette | Red | Blond | Marginal (eye color) | | Brown | 0.11 | 0.20 | 0.04 | 0.01 | 0.37 | | Blue | 0.03 | 0.14 | 0.03 | 0.16 | 0.36 | | Hazel | 0.03 | 0.09 | 0.02 | 0.02 | 0.16 | | Green | 0.01 | 0.05 | 0.02 | 0.03 | 0.11 | | Marginal (hair color) | 0.18 | 0.48 | 0.12 | 0.21 | 1.0 | Previously we calculated: - What is the probability of a person having red hair given that they have blue eyes \vfill We now see that this is a simple illustration of Bayes rule. \vfill A classic example of Bayes rule focuses on diagnosing a rare disease. There are a few important values we need to state: \vfill - Let $\theta$ be \vfill - Let $T$ be \vfill - Let $Pr(\theta = Yes) = p_\theta$ be \vfill - Let $Pr(Test = Yes |\theta = Yes) =p_{T+}$ be \vfill - Let $Pr(Test = Yes | \theta = No) = p_{T-}$ be \vfill - **Question:** do we need, - $Pr(\theta = No)$ - $Pr(Test = No |\theta = Yes)$ - $Pr(Test = No| \theta = No)$ \vfill \newpage Assume we are testing citizens for Extra Sensory Perception (ESP). The ultimate goal will be to determine the probability that an individual has ESP if they test positive for ESP. Mathematically this is stated as $Pr(\theta=Yes|Test=Yes)$. \vfill First using the generic probability from the previous page, compute $Pr(\theta=Yes|Test=Yes)$. \begin{eqnarray*} Pr(\theta=Yes|Test=Yes) &=& \end{eqnarray*} \vfill \vfill \vfill Now to make this concrete assume: - The rate of ESP in the population is 1 in \vfill - The hit rate of the test is \vfill - The false detection rate is 1 in \vfill - **Question:** Before doing any math, what is your guess for the probability that a person receiving a positive test actually has ESP? \vfill ```{r} p.theta <- 1 / 100000 p.t.plus <- 9999 / 10000 p.t.minus <- 1 / 10000 p.theta.true <- p.t.plus * p.theta / (p.t.plus * p.theta + p.t.minus * (1 - p.theta)) ``` It turns out that the probability that a person actually has ESP given they had a positive test is $Pr(\theta = Yes | Test = Yes) =$ \vfill This example allows us to understand the mechanisms behing Bayes rule. - \vfill - \vfill \newpage ### Bayes rule with parameters and data The previous example was essentially a probability exercise and we were not doing Bayesian statistical analysis per se, but rather just using Bayes rule. Bayesian statistical analysis refers to a fairly specific application of this theorem where: \vfill - \vfill - \vfill - Bayes rule is used to convert the prior belief on the parameters *and* the statistical model into a **posterior** belief $p(\theta|\mathcal{D})$. \vfill \vfill \vfill ## Example of Bayesian Analysis on a binary outcome Consider estimating the probability that a die will roll a six and recall the 5 steps in a Bayesian analysis 1. Identify the data relevant to the research question. 2. Define a descriptive model for the relevant data. 3. Specify a prior distribution on the parameters. 4. Use Bayesian inference to re-allocate credibility across parameter values. 5. Check that the posterior predictions mimic the data with reasonable accuracy. \vfill \newpage ### 1. Identify the data relevant to the research question. - What data do we need to determine of the probability that a die lands on a six? \vfill ### 2. Define a descriptive model for the relevant data. A descriptive model denoted as $p(\mathcal{D}|\theta)$ is needed for the die rolling experiment. - what is $\mathcal{D} = \{d_1, d_2, \dots, d_n\}$ \vfill - what is $\theta$: \vfill - what is a descriptive model for $p(\mathcal{D}|\theta)$, \vfill \vfill \vfill \vfill This model is related to a binomial distribution and will be the mathematical machinery that allows updated prior beliefs in a formulaic manner. \newpage ### 3. Specify a prior distribution on the parameters Here are a couple of reasonable prior distributions on $\theta$, the probability of rolling a 6. ```{r, echo = F} par(mfcol=c(1,2)) x <- seq(0,1,by=.001) alpha <- 2; beta <- 10 max.x <- max(dbeta(x, shape1 = alpha, shape2 = beta)) plot(x, dbeta(x, shape1 = 1, shape2 = 1), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta)) plot(x, dbeta(x, shape1 = alpha, shape2 = beta), type = 'l', ylim=c(0,max.x), ylab='', xlab=expression(theta)) ``` Discuss the implications behind each figure. \vfill \vfill \vfill ### 4. Use Bayesian inference to re-allocate credibility across parameter values. Recall the goal of this analysis was to learn about $\theta$ the probability of rolling a six. Specifically, we are interested in the posterior distribution $p(\theta|\mathcal{D})$. \vfill \vfill Lets assume a few data collection procedures: ```{r,echo=F} ten.rolls <- 8 twentyfive.rolls <- 20 onehundred.rolls <- 85 ``` 1. 10 rolls of the die, with 2. 25 rolls of the die, with 3. 100 rolls of the die, with \newpage With 10 rolls ```{r, echo=F, fig.width = 10, fig.height=12, fig.align='center'} max.x <- max(c(dbeta(x, shape1 = ten.rolls, shape2 = 10 - ten.rolls),dbeta(x, shape1 = ten.rolls + alpha, shape2 = 10 - ten.rolls+beta))) par(mfcol=c(3,2)) plot(x, dbeta(x, shape1 = 1, shape2 = 1), type='l', ylim=c(0,12), ylab='', xlab = expression(theta), main='Flat Prior') plot(x, dbeta(x, shape1 = ten.rolls, shape2 = 10 - ten.rolls), type='l', ylim=c(0,12), ylab='', xlab = expression(theta), main='likelihood') plot(x, dbeta(x, shape1 = ten.rolls + 1, shape2 = 10 - ten.rolls + 1), type='l', ylim=c(0,12), ylab='', xlab = expression(theta), main='posterior') ## plot(x, dbeta(x, shape1 = alpha, shape2 = beta), type = 'l', ylim=c(0,12), ylab='', xlab=expression(theta), main='informative prior') plot(x, dbeta(x, shape1 = ten.rolls, shape2 = 10 - ten.rolls), type='l', ylim=c(0,12), ylab='', xlab = expression(theta), main='likelihood') plot(x, dbeta(x, shape1 = ten.rolls + alpha, shape2 = 10 - ten.rolls + beta), type='l', ylim=c(0,12), ylab='', xlab = expression(theta), main = 'posterior') ``` \newpage Then with 100 rolls ```{r, echo=F, fig.width = 10, fig.height=12, fig.align='center'} max.x <-12 par(mfcol=c(3,2)) plot(x, dbeta(x, shape1 = 1, shape2 = 1), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta), main='flat prior') plot(x, dbeta(x, shape1 = onehundred.rolls, shape2 = 100 - onehundred.rolls), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta), main = 'likelihood') plot(x, dbeta(x, shape1 = onehundred.rolls + 1, shape2 = 100 - onehundred.rolls + 1), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta), main = 'posterior') ## plot(x, dbeta(x, shape1 = alpha, shape2 = beta), type = 'l', ylim=c(0,max.x), ylab='', xlab=expression(theta), main='informative prior') plot(x, dbeta(x, shape1 = onehundred.rolls, shape2 = 100 - onehundred.rolls), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta), main='likelihood') plot(x, dbeta(x, shape1 = onehundred.rolls + alpha, shape2 = 100 - onehundred.rolls + beta), type='l', ylim=c(0,max.x), ylab='', xlab = expression(theta), main = 'posterior') ``` \newpage #### influence of the sample size and prior on posterior \vfill With Bayesian statistics there is an interplay between the strength of our prior beliefs and the amount of data collected. \vfill - The posterior distribution can be considered as a weighted average between the prior distribution and the data. \vfill - \vfill - \vfill \vfill