\documentclass[10pt,titlepage]{article}
\usepackage{amsmath}
\usepackage{graphicx}
\allowdisplaybreaks
\def\ds{\displaystyle}

\jot=.2in \pagestyle{empty} \setlength{\topmargin}{-0.5in}
\setlength{\textheight}{9.5in} \setlength{\oddsidemargin}{-0.3in}
\setlength{\evensidemargin}{-0.2in} \setlength{\textwidth}{6.9in}
\font\heada=cmbx10 scaled\magstep3 \font\headb=cmsl10
scaled\magstep1 \font\headc=cmr8 \pretolerance=10000
\setlength{\parindent}{2 em}


\begin{document}
\begin{center}
{\heada Project 11 - Simple Linear Regression}\\
{\headb Statistics 401: Fall 2007}\\
{\it Due Friday, May 4}
\end{center}
\smallskip

\noindent This project is not a required part of the course
curriculum.   If completed, your lowest project score will be
replaced with your grade on this project.

\bigskip
\noindent Use R to complete this project. Attach all R commands used
to complete this project in an appendix. Annotate with the problem
number.

\begin{enumerate}
\item Do problem 5.4 on page 194 of your textbook.

\item In the following questions, you will analyze and fit two simple linear regression models to use \underline{distance} from
campus (in miles) to explain \underline{rent}s (in dollars) around
Montana State University. The data, collected from class on April
30, 2007, is available at the STAT401 web site.

\begin{enumerate}

\item \label{scatplot} Display a scatter-plot of the amount of rent ($y$) versus distance from MSU
($x$).  Plot the least squares regression line in this same plot
using the abline() command. Include this scatter plot in your
report, and reference and label it properly.

\item Calculate the sample correlation coefficient $r$.  What parameter does $r$ estimate? You may
use R for the calculation.

\item Give the \underline{form}, \underline{direction} and
\underline{strength} of the relationship between $x$ and $y$.   For
each, indicate what output you are using.

\item Give the SLR \underline{model} which describes the amount of rent as a linear
function of distance.  (The model consists of the parameters
$\beta_0$, $\beta_1$ and $\sigma$,  NOT the estimates for these
parameters!)


\item Use R to fit two different SLR models to describe rent as a function of distance from campus.

\begin{enumerate}
\item Fit the first SLR using all 10 data points.

\item According to the ``1.5 IQR rule", identify the outlier in the data
set (with respect to the $x$ variable).


\item Remove this outlier from the data set, then fit a SLR
model to this ``new" data set.

\item Fill in the following table with the results from the two regressions.   The $p$-value
is for the ``slope test."   Use the anova() command to get the
$MSE$.

\begin{center}
\begin{tabular}{|l|c|c|c|c|}
  \hline
  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
               & Intercept & Distance & $p$-value & $\sigma\approx \sqrt{MSE}$  \\\hline
  with outlier &  &  &  & \\
  outlier removed &  &  &  & \\
  \hline
\end{tabular}
\end{center}

\end{enumerate}


\item Is the assumption that a SRS was taken satisfied?  To what
population, if any, can these results be extended?


\item Comment on each model, indicating whether there is a significant linear relationship between rent and
distance at a significance level of $\alpha=.05$?   Indicate which R
output justifies your answer.

\item Add the least squares regression line for the model without the outlier to the scatter-plot in
\eqref{scatplot}.


\item Was the outlier influential on the results of the regression?
Explain.





\end{enumerate}


\newpage
\item In the following questions, you will analyze and fit three
different simple linear regression models to explain global warming
from the years 1890-1980 (solar data is not readily available past
1980).  In the data file, available at the STAT401 web site, mean
global \underline{temp}erature is given as a change in degrees
Celsius from the 1951-1980 average; atmospheric \underline{carbon}
is in parts per million; \underline{solar} magnetic cycle length is
in years; and \underline{time} is in years.  If you are interested,
I gathered the data, in the following manner:
\begin{itemize}

\item  Temperature measurements from 1890-2000 are published by NASA's Goddard Institute from Space
Studies in Figure 1 at {\small http://data.giss.nasa.gov/gistemp/2005/}.

\item Carbon measurements from 1950 to present from the well known ``Keeling Curve" as reported in
the movie {\em Inconvenient Truth}.  For carbon measurements from 1890-1990, see Figures 1 and 12 in
Robinson's ``Environmental Effects of Increased Atmospheric Carbon
Dioxide" paper at {\small http://www.oism.org/pproject/review.pdf}.

\item Solar magnetic cycle length measurements from 1890-1980 were published by in a 1991 {\em Science}
paper by Friis-Christensen, E. and K. Lassen,
``Length of the solar cycle: an indicator of solar activity closely
associated with climate"  (254, p698-700).  See the first
figure at {\small http://www.tmgnow.com/repository/solar/lassen1.html} or Figure 3 of
Robinson's paper given above.

\end{itemize}

\begin{enumerate}


\item Give three scatter-plots of the temperature change ($y$) versus each of the following predictors: carbon,
solar magnetic cycle length, and time.  Plot the least squares
regression line in each.  Include these scatter plots in your
report, and reference and label them properly.

\item Find the simple linear least squares regression line to explain global
temperature change with each of the following
predictors: carbon, solar magnetic cycle length, and time. Fill in
the following table with the results.  The $p$-value is for the
``slope test."


\begin{center}
\begin{tabular}{||l|c|c|c|c|c||}
  \hline
  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
               & $\beta_0$ estimate & $\beta_1$ estimate & $p$-value & $\sigma\approx \sqrt{MSE}$  & $R^2$\\\hline
  carbon &  &  &  &  &\\
  solar &  &  &  & & \\
  time & & & & & \\\hline
\end{tabular}
\end{center}

\item For each predictor, explain if there is a significant linear relationship with global warming (use a significance level of $\alpha=.05$)?
Explain what R output you are using to justify your answer.

\item \label{bestSLR} Which of these SLR models is the best for predicting
mean global temperature change?   Indicate which R output you are
using to justify your answer.


\end{enumerate}

\item Interpret the slope of the SLR for temperature and carbon in
terms of the problem.

\item Use the appropriate SLR from \#3 to predict the difference in mean global temperature from the 1951-1980 average in
1945, the year that World War II ended.

\item Use the appropriate SLR from \#3 to determine in what year that mean global
temperature will be 1 degree higher than the 1951-1980 mean.  Why is
this a dubious question to try to answer with this model?


\item In order to study how carbon and solar magnetic cycle length {\em together} predict global warming,
fit a {\em multiple linear regression} (MLR) to the data.

\begin{enumerate}
\item Fit the MLR in R by implementing
\begin{verbatim}
  mlr = lm(temp ~ carbon + solar)
\end{verbatim}

Include the R-code and R output in the appendix of your report.

\item Does this model agree with your conclusion in \eqref{bestSLR}?

\item Notice that the coefficient (or ``slope") of the carbon term
in the SLR (for temperature and carbon in problem \#3) is different
than in MLR. Explain why these estimates are different.

\end{enumerate}


\end{enumerate}

\end{document}

