---
title: "Lab 9 SOLUTIONS"
header-includes:
  - \usepackage{placeins}
date: "March 7, 2018"
output: pdf_document
---

# 3 (a) & (b)

```{r}
source("http://www.math.montana.edu/parker/courses/STAT411/diagANOVA.r")
```

```{r, tidy = TRUE, results='hold'}
# Get data
d <- read.csv("http://www.math.montana.edu/parker/courses/STAT411/Lab9_Claims.CSV")

# Fit the SLR
m <- lm(log(claim.rate) ~ Avg.Temp, data = d)
summary(m)
```

\pagebreak

# 3 (c) & (d)

```{r, tidy = TRUE}
plot(log(claim.rate) ~ Avg.Temp, data = d, las = 1, xlab = "Average Monthly Temp (in F)", 
     ylab = "Log(Claim Rate) (per 100 Employees)")
# Proper axis labels (with units) are important!

abline(2.37, -0.028, lwd = 2, col = "blue")
# abline() adds the estimated line from the SLR to the scatterplot in blue.  
# This should have the form: abline(intercept,slope)
```

# 3 (e) 

```{r}
# Model Diagnostics
diagANOVA(m)
```


# 3 (f)

So the fitted model is 
  $$\hat\mu\{ln(\text{claims rate)|temperature}\}= 2.367725 - 0.027564 \text{ *temperature}$$
where $\hat \mu$ is the __estimated__ *mean* monthly log-claim rate per 100 employees and $x$ is average monthly temperature (in degrees Fahrenheit).  

\vspace{1cm}

To get the exponential, we'll rewrite the equation with respect to the median.
  $$\hat\mu\{ln(\text{claims rate)|temperature}\}=\widehat{Median}\{ln(\text{claims rate)|temperature}\}=ln(\widehat{Median}\{\text{claims rate\})|temperature}.$$

Now exponentiate both sides:
\begin{eqnarray*}
\widehat{Median}\{\text{claims rate\}|temperature}&=& e^{2.367725 - 0.027564 \text{temperature}}\\
&=&e^{2.367725}\times e^{- 0.027564 \text{temperature}}\\
&=&\hat C\times e^{\hat r\times \text{temperature}}
\end{eqnarray*}

So, in general

\begin{center}
$\hat C=e^{\hat \beta_0}$ and $\hat r = \hat\beta_1$.
\end{center}

```{r}
# Exponentiate the y-intercept to find C hat
exp(coef(m)[1])
```

\vspace{1cm}

So the fitted model is 
$$\widehat{Median}\{\text{claims rate|temperature}\}= 10.6730801e^{- 0.027564 \times \text{temperature}},$$

where $\widehat{Median}$ is the __estimated__ *median* monthly insurance claim rate per 100 employees.  

# 3 (g)

A 95% CI for $C$, which is an estimate of the median number of claims at 0F (which is very unreliable due to extrapolation!) is found by:

\vspace{1cm}

```{r}
# confidence intervals for model coefficents (95% by default) 
confint(m)

# exponentiate confidence interval for intercept (since the intercept is C)
exp(c(1.89179444, 2.84365494))
```

So if extrapolation makes sense (which I do not believe that it does, but I am giving you a conclusion here so you have a worked example), then we are 95\% confident that the true median number of insurance claims at 0 degrees F is between 6.6 and 17.2 per 100 employees.

And if extrapolation makes sense (which I do not believe that it does, but I am giving you a conclusion here so you have a worked example), then we are 95\% confident that the true median number of insurance claims decreases at an exponential rate between 0.02 and 0.035 
over temperature.   Put another way, we are 95\% confident that the true median log-number of insurance claims decreases between 0.02 and 0.035 
for every 1 degree F increase in temperature.