--- title: "HW 8 Solutions, 23 points" header-includes: \usepackage{placeins} \usepackage{verbatim} date: "April 2, 2018" output: pdf_document --- ```{r, echo = FALSE} source("http://www.math.montana.edu/parker/courses/STAT411/diagANOVA.r") library(Sleuth3) ``` # Formatting - 1pt # 1 (2 pts) The scatterplots of lifespans and $\log$(Lifespans) as functions of metabolic rate and $\log$(metabolic rate) are displayed below. A SLR was fit to Lifespans as a function of Metab (without any log transformations), giving the regression equation \begin{center} $\hat{\mu}\{\text{lifespan|metabolic rate}\} = 10.3 + 0.000387(\text{metabolic rate}).$ \end{center} The SLR line was added to the first scatterplot below. In this first scatterplot (Lifespan \& Metabolic Rate) we can see that the regression line does not fit the untransformed data well. Instead, these scatterplots suggest that we should log-transform both lifespan and metabolic rate to best satisfy linearity. ```{r, echo = FALSE, fig.width = 12, fig.height = 10} par(mfrow = c(2,2)) plot(ex0826$Metab, ex0826$Life, xlab = "Average Lifespan (years)", ylab = "Average Metabolic Rate (kJ/day)", las = 1) # (original scale) y verses x abline(10.3, 3.87e-4, col="red") # adding the regression line plot(log10(ex0826$Metab), ex0826$Life, xlab = "Log Average Lifespan (log-years)", ylab = "Average Metabolic Rate (kJ/day)", las = 1) # log(y) verses x plot(log10(ex0826$Metab), log10(ex0826$Life), xlab = "Log Average Lifespan (log-years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # log(y) verses log(x) plot(ex0826$Metab, log10(ex0826$Life), xlab = "Average Lifespan (years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # y verses log(x) ``` \newpage # 2 (1 pt) As suggested by the scatterplots, residual plots (in the Appendix) confirm that regression fit in \#1 is poor. We see a long right tail in the Normal Q-Q plot, indicating a potential violation of the Normality assumption. In the scatterplot of the original data, we see a potential non-linear relationship, which is also seen in the residual verses fitted plot. Additionally, in both the scatterplot of the original data and the residual verses fitted plot we see a possible funnel shape (increasing variance with larger fitted values), which violates the assumption of constant variance. We need more information regarding how the animals in each species were sampled in order to assess independence of these data. For example, if some species were sampled from the same geographic location, this might indicate a violation of independence. # 3 (2 pts) SLRs were fit to each of the 3 candidate models (i.e., $y$ vs $\log_{10}(x)$; $\log_{10}(y)$ vs $x$ and $\log_{10}(y)$ vs $\log_{10}(x)$), and residual and normal probability plots were generated for each model. These plots show that the SLR that models $\log$(Lifespans) as a function of $\log$(metabolic rate) best satisfies the regression assumptions. In the residual plots for $\log$(Lifespans) as a function of $\log$(metabolic rate), it seems that we have remedied the violations of constant variance and linearity assumptions. The Normal Q-Q plot still displays some right skew, but in my opinion this does not suggest a violation from the assumption of Normally distributed residuals. # 4 (1 pt) Of the 4 models considered, the best SLR model appears to be $$\mu\{\log_{10}(Life|Metab)\} = \beta_0 + \beta_1\log_{10}(Metab),$$ with an estimated equation of $$\hat{\mu}\{\log_{10}(Life|Metab)\} = -0.09221 + 0.388215 * log_{10}(Metab),$$ # 5 (1 pt) R reports $R^2 = 0.779$ so the correlation is $\sqrt{0.779}=0.883$. \newpage # 6 (1 pt) The data are plotted in a scatterplot below. ```{r, echo = FALSE, fig.height = 5, fig.width = 6} par(mfrow = c(1, 1)) plot(log10(ex0826$Metab), log10(ex0826$Life), xlab = "Log Average Lifespan (log-years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # log(y) verses log(x) abline(-0.09221, 0.38215, col="red") ``` # 7 (2 pts, 1pt for correct derivation, 1pt for correct answer) We have to exponentiate both sides to get this written as $\mu\{Life|Metab\}$. First, under the assumption of normality, then $$\mu\{\log_{10}(Life|Metab)\} = {\rm Median}\{\log_{10}(Life|Metab)\}=\log_{10}({\rm Median}\{Life|Metab\}).$$ So, \begin{eqnarray*} \log_{10}({\rm Median}\{Life|Metab\})&=&\beta_0 + \beta_1\log_{10}(Metab)\\ {\rm Median}\{Life|Metab\}&=&10^{\beta_0 + \beta_1\log_{10}(Metab)}\\ &=&10^{\beta_0} \times Metab^{\beta_1} \end{eqnarray*} The fit SLR equation is $$\hat \mu\{\log_{10}(Life)|Metab\} = -0.09221 + 0.38215 * \log_{10}(Metab).$$ Plugging these estimate for $\beta_0$ and $\beta_1$ into the last equation we get that: $$\widehat{\rm Median}\{Life|Metab\}= 0.8087048 \times Metab^{0.382}$$ # 8 (5 pts, 1pt for correct test, 1pt for assumptions, 1pt for correct test conclusion, 1pt for correct CI, 1pt for correct CI interpretation) To test whether there is a association between lifespan and metabolic rate is the same as testing whether there is a linear relationship between log-metabolic rate and log-lifespan. We perform the following 6 steps: \begin{itemize} \item $H_0: \beta_1 = 0$ vs. $H_a:\beta_1 \neq 0$. \item We checked the SLR assumptions above and found no evidence to suggest that either linearity, normality, independence, or constant variance are violated. \item This is a $t$-test, with test statistic (from the summary of the log-log model in the Appendix) of $t = 18.11,$ which follows a $t_{23}$ distribution. \item The two-sided $p$-value is tiny, $p$-value$< 2\times 10^{-16}$ \item Because the $p$-value is tiny, we have strong evidence against $H_0$ and decide to reject $H_0$. \item There is evidence to suggest that there is a linear relationship between log-life span and log-metabolism. The estimated equation that describes this relationship is $\widehat{\rm Median}\{Life|Metab\}=0.809\times Metab^{0.382}$, with a 95\% CI of [0.34, 0.42] for the exponential rate of increase of lifespan as $Metab$ increases. \end{itemize} # 9 (2 pts, 1/2pt for each of the 4 ideas: sampling, population, study type, causation/association) This was an observational study, so we may only state that increased metabolism is associated different longer life spans. It is not clear how one would conduct an experiment to assess whether faster metabolic rates *cause* longer life spans. Because it is not clear how mammals were chosen to be included in the data set and it is not possible to randomize metabolic rates, then we can only state that the association between metabolism and life spans can be generalized to this sample of mammals. \newpage # 10 (2 pts, 1pt for correct CI, 1pt for correct interpretation) An individual 95\% CI for the median lifespan of a mammal with a metabolic rate of 100 (i.e., $\log_{10} x = 2$) was found by fitting the model (i.e., the "shift trick") \begin{eqnarray*} \log_{10}({\rm Median}\{Life|Metab\})&=&\beta_0 + \beta_1\left(\log_{10}(Metab)-2\right) \\ {\rm Median}\{Life|Metab\}&=& 10^{\beta_0 + \beta_1(\log_{10}(Metab)-\log_{10}(100))} \\ &=& 10^{\beta_0} \times \left(Metab/100\right)^{\beta_1} \\ &=& 10^{\beta_0} \times \left(100/100\right)^{\beta_1} \\ &=& 10^{\beta_0} \times 1^{\beta_1} \\ &=& 10^{\beta_0} \end{eqnarray*} (see Appendix for model summary, CI, and exponentiation). A 95\% CI for $\beta_0$ is [0.623 0.721]. A 95\% CI for $10^{\beta_0}$ is $[10^{0.6226789}, 10^{0.7214959}] = [4.194487, 5.266182]$. Hence we are 95\% confident that the true median lifespan for mammals with a metabolic rate of 100 (kJ per day) is between 4.2 and 5.3 years. # 11 (1 pt) A band of Workman-Hotelling CIs that maintain a family-wise confidence level of 95\%, was found and added to the scatterplot in the Appendix. # 12 (1 pt) The Workman-Hotelling 95\% CI for the mean lifespan of mammals with a metabolic rate of 100 ($\log_{10}(Metab) = 2$) is LARGER than the individual 95\% CI we found in \#10 for the median lifespan of a mammal with a metabolic rate of 100 because the Workman-Hotelling CIs maintains a 95\% family-wise confidence level over __all__ metabolic rates. # 13 (1 pt) A band of individual 95\% PIs for lifespan, was found and added to the scatterplot in the Appendix. \newpage \section{Appendix} \subsection{Housekeeping} ```{r, eval = FALSE} source("http://www.math.montana.edu/parker/courses/STAT411/diagANOVA.r") library(Sleuth3) ``` # Inspect data ```{r} summary(ex0826) dim(ex0826) ``` # 1 ```{r, eval = FALSE} par(mfrow = c(2,2)) plot(ex0826$Metab, ex0826$Life, xlab = "Average Lifespan (years)", ylab = "Average Metabolic Rate (kJ/day)", las = 1) # (original scale) y verses x abline(10.3, 3.87e-4, col="red") # adding the regression line plot(log10(ex0826$Metab), ex0826$Life, xlab = "Log Average Lifespan (log-years)", ylab = "Average Metabolic Rate (kJ/day)", las = 1) # log(y) verses x plot(log10(ex0826$Metab), log10(ex0826$Life), xlab = "Log Average Lifespan (log-years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # log(y) verses log(x) plot(ex0826$Metab, log10(ex0826$Life), xlab = "Average Lifespan (years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # y verses log(x) ``` ```{r} m2 <- lm(Life ~ Metab, data = ex0826) summary(m2) ``` # 2 & 3 ```{r} diagANOVA(m2) diagANOVA(lm(Life ~ log10(Metab), data = ex0826)) diagANOVA(lm(log10(Life) ~ Metab, data = ex0826)) m3 <- lm(log10(Life) ~ log10(Metab), data = ex0826) diagANOVA(m3) ``` # 4 & 5 ```{r} summary(m3) confint(m3) ``` # 6 ```{r, eval = FALSE} par(mfrow = c(1, 1)) plot(log10(ex0826$Metab), log10(ex0826$Life), xlab = "Log Average Lifespan (log-years)", ylab = "Log Average Metabolic Rate (log-kJ/day)", las = 1) # log(y) verses log(x) abline(-0.09221, 0.38215, col="red") ``` # 10 Implementing the "shift-trick" to assess the median lifespan of mammals with a metabolic rate of 100. ```{r} m4 <- lm(log10(Life) ~ I(log10(Metab)-2),data = ex0826) # Shifts intercept to be for a mammal with a metabolic rate of 100 (log10(100) = 2) summary(m4) confint(m4) 10^c(0.6226789, 0.7214959) ``` # 11 & 13 The Workman-Hotelling and PI bands: ```{r, fig.width = 7, fig.height = 6} new <- data.frame(Metab = seq(5, 165000, length = 100)) est.mean.ses <- predict(m3, newdata = new, se.fit = T, interval = "confidence") head(data.frame(new, est.mean.ses$fit)) conf.BAND.WH.low <- est.mean.ses$fit[ ,1] - sqrt(2*qf(.95, 2, 93))*est.mean.ses$se.fit conf.BAND.WH.hi <- est.mean.ses$fit[ ,1] + sqrt(2*qf(.95, 2, 93))*est.mean.ses$se.fit plot(Life ~ Metab, data = ex0826) lines(new$Metab, 10^est.mean.ses$fit[ ,1], lty = 1, lwd = 2) # add the fitted line lines(new$Metab, 10^conf.BAND.WH.low, lty = 4, lwd = 2, col = 4) # lower WH CL lines(new$Metab, 10^conf.BAND.WH.hi, lty = 4, lwd = 2, col = 4) # upper WH CL # Now add the PIs: pred.ses <- predict(m3, newdata = new, se.fit = T, interval = "prediction") head(data.frame(new, pred.ses$fit)) lines(new$Metab, 10^pred.ses$fit[,2], lty = 2, lwd = 2, col = 3) # lower ind. PL lines(new$Metab, 10^pred.ses$fit[,3], lty = 2, lwd = 2, col = 3) # upper ind. PL legend("bottomright", legend = c("Predicted lifespan", "Family of 95% CIs for median \nlifespan of all mammals", "Individual 95% PIs for one mammal"), lty = c(1, 4, 2), col = c(1, 4, 3), lwd = 2) # Get PI for mammals with Metab = 100 predict(m3, newdata = data.frame(Metab = 100), se.fit = T, interval = "prediction") 10^c(0.2410635, 1.103111) ```