---
title: "STAT 491 - Lecture 9: T-tests"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)
library(rjags)
library(runjags)
library(ggplot2)
library(gridExtra)
knitr::opts_chunk$set(echo = TRUE)
```


### Posterior Predictive Distribution

Another valuable tool in Bayesian statistics is the posterior predictive distribution.

- The posterior predictive distribution can be written as:

$$p(y^{*}|y) = \int p(y^{*}|\theta) p(\theta|y) d\theta$$
where $y^{*}$ is interpreted as a new observation and $p(\theta|y)$ is the posterior for the parameter $\theta$ given that data $y$ have been observed.
\vfill
- The posterior predictive distribution allows us to test whether our sampling model and observed data are reasonable. We will talk more about this later.
\vfill
- The posterior predictive distribution can also be used to make probabilistic statements about the next response, rather than the group mean. In our continuing example, we could calculate the probability of the next observed data point being greater than -0.2.
\vfill
- When $p(\theta|y)$ does not have a standard form, samples from this distribution can be inserted into the sampling model. This sampling procedure is a Monte Carlo approach for this integration.
\vfill

```{r, echo=F, messages=F, results='hide'}
set.seed(02162018)
num.obs <- 50
true.mu <- 0
true.sigmasq <- 1

y <- rnorm(num.obs, mean = true.mu, sd = sqrt(true.sigmasq))
M <- 0
S <- 100
C <- 100000
dataList = list(y = y, Ntotal = num.obs, M = M, S = S, C = C)
modelString = "model {
  for ( i in 1:Ntotal ) {
    y[i] ~ dnorm(mu, 1/sigma^2) # sampling model
  }
  mu ~ dnorm(M,1/S^2)
  sigma ~ dunif(0,C)
} "
writeLines( modelString, con='NORMmodel.txt')
initsList <- function(){
  # function for initializing starting place of theta
  # RETURNS: list with random start point for theta
  return(list(mu = rnorm(1, mean = 0, sd = 100), sigma = runif(1,0,1000)))
}
jagsModel <- jags.model( file = "NORMmodel.txt", data = dataList, 
                         inits =initsList, n.chains = 2, n.adapt = 100)
update(jagsModel, n.iter = 500)
num.mcmc <- 1000
codaSamples <- coda.samples( jagsModel, variable.names = c('mu', 'sigma'), n.iter = num.mcmc)
```

```{r}
posterior.mu <- codaSamples[[1]][,'mu']
posterior.sigma <- codaSamples[[1]][,'sigma']
posterior.pred <- rnorm(num.mcmc, mean = posterior.mu, sd = posterior.sigma)
prob.greater <- mean(posterior.pred > -0.2)
```


```{r, fig.align='center',fig.width=5, echo=F}
hist(posterior.pred, main='p(y*|y)', prob=T,
     xlab=expression(y),breaks='FD')
abline(v=-0.2,lwd=2, col='red')
arrows(x0 = -1.5, y0 = 0.3 , x1 = -1, y1 = 0.1 , col='gray', lwd = 2)
text(x=-1.5, y=.375, paste('Prob in this area \n is ', round(1-prob.greater,3)))
```

\newpage

## T - distribution

- While the normal distribution is often used for modeling continuous data, an alternative is the t-distribution.  *Q:* Where have you seen a t-distribution before and what properties does it have?
\vfill
- The t-distribution has an interesting history.
\vfill
- The t-distribution is
\vfill
- The \vfill

    - normal = 1.96
    - t(50) = 
    - t(40) = 
    - t(30) = 
    - t(20) = 
    - t(10) = 
    - t(5) = 
    - t(3) = 
    - t(1) 
\vfill
-  When the degrees of freedom get large
\vfill
\newpage
```{r, warning=F, echo=F, fig.align='center', fig.width=7}
set.seed(03112018)
normal <- data.frame(rnorm(10000))
colnames(normal) <- 'vals'
t20 <- data.frame(rt(10000, df = 20))
t3 <- data.frame(rt(10000, df = 3))
colnames(normal) <- colnames(t20) <- colnames(t3) <- 'vals'
min.val <- min(t20,t3, normal) - 1
max.val <- max(t20,t3, normal) + 1
plot1 <- ggplot(data=normal, aes(vals)) + geom_histogram(bins = 100) + xlim(min.val, max.val) + labs(subtitle = "Normal Distribution")
plot2  <- ggplot(data=t20, aes(vals)) + geom_histogram(bins = 100) + xlim(min.val, max.val) + labs(subtitle = "t(20) Distribution")
plot3 <- ggplot(data=t3, aes(vals)) + geom_histogram(bins = 100) + xlim(min.val, max.val) + labs(subtitle = "t(3) Distribution")
grid.arrange(plot1, plot2, plot3, nrow=3)
```


### Bayesian modeling with t-distribution
- Sampling model $y \sim t(\mu, \sigma^2, \nu)$

- This requires a prior distribution on:
    - \vfill
    -  \vfill
    -  \vfill
    \newpage
    ```{r, warning=F, echo=F, fig.align='center', fig.width=6}
exp.samples <- data.frame(rexp(10000))
exp.samples2 <- data.frame(rexp(10000, rate = 10))
exp.samples3 <- data.frame(rexp(10000, rate = .1))
max.val <- max(max(cbind(exp.samples, exp.samples2, exp.samples3)))
colnames(exp.samples) <-colnames(exp.samples2) <-colnames(exp.samples3) <- 'vals'
plot1 <- ggplot(data=exp.samples, aes(vals)) + geom_histogram(bins = 100) + labs(subtitle = "rate = 1")
plot2  <- ggplot(data=exp.samples2, aes(vals)) + geom_histogram(bins = 100) + labs(subtitle = "rate = 10")
plot3 <- ggplot(data=exp.samples3, aes(vals)) + geom_histogram(bins = 100) + labs(subtitle = "rate = .1")
grid.arrange(plot2, plot1, plot3, nrow=3)
```

After looking at the figures, what do you think the mean of the exponential distribution is? 

##### JAGS code

```{r, fig.align=='center', fig.width=5}
t.samples <- data.frame(rt(500, df = 3))
colnames(t.samples) <- 'vals'
ggplot(data=t.samples, aes(vals)) + geom_histogram(bins = 100) + 
  labs(subtitle = "samples from t(3) distribution")

#Prior parameters
M <- 0
S <- 100
C <- 10
rate <- .1

# Store data
dataList = list(y = t.samples$vals, Ntotal = nrow(t.samples), M = M, S = S, C = C, rate = rate)

# Model String
modelString = "model {
  for ( i in 1:Ntotal ) {
    y[i] ~ dt(mu, 1/sigma^2, nu) # sampling model
  }
  mu ~ dnorm(M,1/S^2)
  sigma ~ dunif(0,C)
  nu <- nuMinusOne + 1 # transform to guarantee n >= 1
  nuMinusOne ~ dexp(rate)
} "
writeLines( modelString, con='Tmodel.txt')

# initialization
initsList <- function(){
  # function for initializing starting place of theta
  # RETURNS: list with random start point for theta
  return(list(mu = rnorm(1, mean = M, sd = S), sigma = runif(1,0,C), 
              nuMinusOne = rexp(1, rate=rate) ))
}

# Runs JAGS Model
jagsT <- jags.model( file = "Tmodel.txt", data = dataList, inits =initsList, 
                     n.chains = 2, n.adapt = 1000)
update(jagsT, n.iter = 1000)

num.mcmc <- 1000
codaSamples <- coda.samples( jagsT, variable.names = c('mu', 'sigma','nu'), n.iter = num.mcmc)

par(mfcol=c(1,3))
traceplot(codaSamples)
densplot(codaSamples)
HPDinterval(codaSamples)
```

\newpage

## Lab Exercise 1
### 1. 
Simulate 100 responses from a Cauchy distribution, t distribution with $\mu$ = 1, $\sigma^2$=1 and $\nu=1$, and describe this data with a plot and brief description of the data.

### 2. 
Use JAGS to fit a normal sampling model and the following priors for this data. 

 - $p(\mu) \sim N(0,10^2)$
 - $p(\sigma) \sim U(0,1000)$

Discuss the posterior HDIs for $\mu$ and $\sigma$.

### 3. 
Use JAGS to fit a t sampling model and the following priors for this data. 

 - $p(\mu) \sim N(0,10^2)$
 - $p(\sigma) \sim U(0,1000)$
 - $p(\nu) \sim E_+(.1)$, where $E_+(.1)$ is a shifted exponential with rate = .1.

Discuss the posterior HDIs for $\mu$, $\sigma$, and $\nu$.


### 4. 
Use the following code to create posterior predictive distributions for part 2 and part 3. Note: your data and coda objects may need to be renamed for this to work. Compare the data and the posterior predictive model curves with posterior predictive models. Note this is the final step in Bayesian data analysis: verifying that our model / prior selection is an accurate representation of the data. 

```{r, eval=F}
# Posterior Predictive Normal
post.pred.normal <- rnorm(num.mcmc, coda.norm[[1]][,'mu'], coda.norm[[1]][,'sigma'] )
# Posterior Predictive t
post.pred.t <- rt(num.mcmc, df = coda.t[[1]][,'nu']) * coda.t[[1]][,'sigma'] + coda.t[[1]][,'mu']
data.comb <- data.frame(vals = c(t.samples$vals, post.pred.normal, post.pred.t), 
                        model = c(rep('data',100), rep('normal', num.mcmc), rep('t',num.mcmc)))

ggplot(data.comb, aes(vals, ..density.., colour = model)) + geom_freqpoly() +
  ggtitle('Comparison of Posterior Predictive Distributions') 
```
\newpage

## Estimation with Two Groups
A common use of the t-distribution is to make comparisons between two groups. For instance, we may be interested to determine if the mean height of two groups of OK Cupid users are different.

We can write this model out as \vfill
From a Bayesian perspective, this model will require priors on:

 - \vfill
 - \vfill
 - \vfill

### An aside on Null Hypothesis Significance Testing (NHST) (Ch. 11)

- What is the purpose of NHST? 
\vfill
- For instance, consider estimating whether a die has a fair probability of rolling a 6 ($\theta = 1/6$). 

    - Then if we roll the die several times  \vfill
    - If the actual number is far greater or less than our expectation, \vfill
    - To do this, we compute the exact probabilities of getting all outcomes. \vfill
    - The null hypothesis is commonly rejected \vfill
- It is important to note that calculating the probability of all outcomes requires both the sampling and testing procedure. 
\vfill
- We are not going to get into the details, but section 11.1 in the texbook details a situation where a coin is flipped 24 times and results in 7 heads. The goal is determine if the coin is fair. Depending on the sampling procedure used, the p-value can range from .017 to .103 with this dataset.
\vfill
\newpage

### Bayesian Approach to Testing a Point Hypothesis

Consider the die rolling example. What value for ($\theta$) would be says is meaningfully different than $\theta = 1/6$ = `r round(1/6, 3)`?
\vfill
\vfill
- This range around the specified value is \vfill
- Given  \vfill
- A parameter value is declared to be accepted for practical purposes if that value's \vfill
- When the HDI and ROPE overlap, with the ROPE not completely containing the HDI, then neither of the above rules is satisfied and we withhold a decision.
\vfill
- Note that the NHST regime provides no way to confirm a theory, rather just the ability to reject the null hypothesis.
\vfill
\newpage

### Lab Questions
Use the OK Cupid dataset and test the following claim, the mean height OK Cupid respondents reporting their body type as athletic is different than 70.5 inches (this value is arbitrary, but is approximately the mean height of all men in the sample). Interpret the results for each scenario.
```{r}
okc <- read.csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/OKCupid_profiles_clean.csv', stringsAsFactors = F)
library(dplyr)
okc.athletic <- okc %>% filter(body_type == 'athletic' & sex == 'm')
okc %>% filter(sex == 'm') %>% summarise(mean(height))
```

1. Use `t.test()` with a two-sided procedure. 

2. Fit a Bayesian model for $\mu$ with a ROPE of $\pm$ .5 inch. Use the following priors: $p(\mu) \sim N(70.5, 10^2),$ $p(\sigma) \sim Unif(0,20)$, $p(\nu) \sim E_{+}(.1)$ and a t-sampling model.


3. Fit a Bayesian model for $\mu$ with a ROPE of $\pm$ .05 inch
 \newpage
 
### Back to the two-sample case
- Now consider whether their is a significant height difference between male OK Cupid respondents self-reporting their body type as "athletic" and those self-reporting their body type as "fit"

- From the frequentist paradigm, this can be accomplished using the `t.test()` function.
```{r}
okc.fit <- okc %>% filter(sex == 'm' & body_type == 'fit')
t.test(x= okc.athletic$height, y = okc.fit$height)
```

- It is important to note there is no analog to ROPE with the t-test. \vfill

- Here is the Bayesian attempt, using JAGS. We want the posterior of $\mu_{ath} - \mu_{fit}$ for inferences. 

```{r}
M <- 70.5
S <- 10
C <- 20
rate <- .1

# Store data
dataList = list(x = okc.athletic$height, y = okc.fit$height, M = M, S = S, C = C, rate = rate)

# Model String
modelString <- "model {
  for(i in 1:length(x)) {
    x[i] ~ dt( mu_x , 1/sigma_x^2 , nu )

  }
  x_pred ~ dt( mu_x , 1/sigma_x^2 , nu ) # posterior predictive for x

  for(i in 1:length(y)) {
    y[i] ~ dt( mu_y , 1/sigma_y^2 , nu )
  }
  y_pred ~ dt( mu_y , 1/sigma_y^2 , nu ) # posterior predictive for y

  mu_diff <- mu_x - mu_y

  # The priors
  mu_x ~ dnorm( M , 1/S^2 )
  sigma_x ~ dunif( 0 , C )

  mu_y ~ dnorm( M , 1/S^2  )
  sigma_y ~ dunif( 0 , C )

  nu <- nuMinusOne+1
  nuMinusOne ~ dexp(rate)
}"
writeLines( modelString, con='TwoSampleT.txt')

# initialization
initsList <- function(){
  # function for initializing starting place of theta
  # RETURNS: list with random start point for theta
  return(list(mu_x = rnorm(1, mean = M, sd = S), sigma_x = runif(1,0,C), 
              mu_y = rnorm(1, mean = M, sd = S), sigma_y = runif(1,0,C), 
              nuMinusOne = rexp(1, rate=rate) ))
}

# Runs JAGS Model
jagsT <- jags.model( file = "TwoSampleT.txt", data = dataList, inits =initsList, 
                     n.chains = 3, n.adapt = 1000)
update(jagsT, n.iter = 1000)

coda.t <- coda.samples( jagsT, variable.names = c('mu_x', 'sigma_x','nu', 'mu_y', 'sigma_y', 'mu_diff'), n.iter = num.mcmc)

HPDinterval(coda.t)
```

Given that the HDI for the difference in mean heights is `r HPDinterval(coda.t[[1]])['mu_diff',]` the interpretation here depends on our ROPE.