--- title: "Lab 9 SOLUTIONS" header-includes: - \usepackage{placeins} date: "March 7, 2018" output: pdf_document --- # 3 (a) & (b) ```{r} source("http://www.math.montana.edu/parker/courses/STAT411/diagANOVA.r") ``` ```{r, tidy = TRUE, results='hold'} # Get data d <- read.csv("http://www.math.montana.edu/parker/courses/STAT411/Lab9_Claims.CSV") # Fit the SLR m <- lm(log(claim.rate) ~ Avg.Temp, data = d) summary(m) ``` \pagebreak # 3 (c) & (d) ```{r, tidy = TRUE} plot(log(claim.rate) ~ Avg.Temp, data = d, las = 1, xlab = "Average Monthly Temp (in F)", ylab = "Log(Claim Rate) (per 100 Employees)") # Proper axis labels (with units) are important! abline(2.37, -0.028, lwd = 2, col = "blue") # abline() adds the estimated line from the SLR to the scatterplot in blue. # This should have the form: abline(intercept,slope) ``` # 3 (e) ```{r} # Model Diagnostics diagANOVA(m) ``` # 3 (f) So the fitted model is $$\hat\mu\{ln(\text{claims rate)|temperature}\}= 2.367725 - 0.027564 \text{ *temperature}$$ where $\hat \mu$ is the __estimated__ *mean* monthly log-claim rate per 100 employees and $x$ is average monthly temperature (in degrees Fahrenheit). \vspace{1cm} To get the exponential, we'll rewrite the equation with respect to the median. $$\hat\mu\{ln(\text{claims rate)|temperature}\}=\widehat{Median}\{ln(\text{claims rate)|temperature}\}=ln(\widehat{Median}\{\text{claims rate\})|temperature}.$$ Now exponentiate both sides: \begin{eqnarray*} \widehat{Median}\{\text{claims rate\}|temperature}&=& e^{2.367725 - 0.027564 \text{temperature}}\\ &=&e^{2.367725}\times e^{- 0.027564 \text{temperature}}\\ &=&\hat C\times e^{\hat r\times \text{temperature}} \end{eqnarray*} So, in general \begin{center} $\hat C=e^{\hat \beta_0}$ and $\hat r = \hat\beta_1$. \end{center} ```{r} # Exponentiate the y-intercept to find C hat exp(coef(m)[1]) ``` \vspace{1cm} So the fitted model is $$\widehat{Median}\{\text{claims rate|temperature}\}= 10.6730801e^{- 0.027564 \times \text{temperature}},$$ where $\widehat{Median}$ is the __estimated__ *median* monthly insurance claim rate per 100 employees. # 3 (g) A 95% CI for $C$, which is an estimate of the median number of claims at 0F (which is very unreliable due to extrapolation!) is found by: \vspace{1cm} ```{r} # confidence intervals for model coefficents (95% by default) confint(m) # exponentiate confidence interval for intercept (since the intercept is C) exp(c(1.89179444, 2.84365494)) ``` So if extrapolation makes sense (which I do not believe that it does, but I am giving you a conclusion here so you have a worked example), then we are 95\% confident that the true median number of insurance claims at 0 degrees F is between 6.6 and 17.2 per 100 employees. And if extrapolation makes sense (which I do not believe that it does, but I am giving you a conclusion here so you have a worked example), then we are 95\% confident that the true median number of insurance claims decreases at an exponential rate between 0.02 and 0.035 over temperature. Put another way, we are 95\% confident that the true median log-number of insurance claims decreases between 0.02 and 0.035 for every 1 degree F increase in temperature.