\documentclass[12pt,titlepage]{article}
\usepackage{amsmath}
\usepackage{graphicx}
\allowdisplaybreaks

\jot=.2in \pagestyle{plain} \setlength{\topmargin}{-0.5in}
% \setlength{\footheight}{0 in}
\setlength{\textheight}{9.5 in}
\setlength{\oddsidemargin}{-0.1in}
\setlength{\evensidemargin}{-0.2in}
\setlength{\textwidth}{6.5in}
\font\heada=cmbx10 scaled\magstep3
\font\headb=cmsl10 scaled\magstep1
\font\headc=cmr8
\pretolerance=10000
\setlength{\parindent}{2 em}
%\input macros
\newdimen\digitwidth
\newdimen\minuswidth
\setbox0=\hbox{\rm0}
\digitwidth=\wd0
\setbox1=\hbox{$-$}
\minuswidth=\wd1
\newdimen\starr
\setbox2=\hbox{${}^*$}
\starr=\wd2

{\catcode`?=\active
\def?{\kern\digitwidth}
\catcode`@=\active
\def@{\kern\minuswidth}
\catcode`|=\active
\def|{\kern\starr}}



\begin{document}
\begin{center}
{\heada Project 6 - Estimation}\\
{\headb Statistics 401: Fall 2006}\\
{\it Due Monday, March 26}
\end{center}
\bigskip

\noindent Justify your answers.  Feel free to use R for
computations.  As always, properly label figures and reference them
from the body of your report.

\begin{enumerate}
\item Do problem 9.2 on page 367 of your textbook.

\item The Environmental Protection Agency has
established an air quality standard for lead of 1.5 $\mu$g/m$^3$.
Listed below are measured amounts of lead (in micrograms per cubic
meter or $\mu$g/m$^3$) in the air recorded at Building 5 of the
World Trade Center site on different days immediately following the
destruction caused by the terrorist attacks of September 11, 2001.
After the collapse of the two World Trade Center buildings, there
was considerable concern about the quality of the air. The data file
``lead.txt" can be found on the Stat 401 website.

\begin{center}
5.40 \ 1.10 \ 0.42 \ 0.73 \ 0.51 \ 1.10 \ 0.66 \ 1.02 \ 0.45 \ 0.69
\ 0.72 \ 0.55
\end{center}

\begin{enumerate}
\item Give a point estimate for $\sigma$.   What point estimator did you
use to obtain your estimate?

\item Give a point estimate of $\tilde \mu$.   What point estimator did you
use to obtain your estimate?

\item Give a point estimate of $\mu$.  What point estimator did you
use to obtain your estimate?

\item Are there any outliers in this sample?  If so, indicate the outlier, and state whether it is a mild or extreme outlier.
Clearly indicate which rule you are using to answer this question.

\item Suppose that the population distribution of lead levels is
symmetric but with heavier tails than a normal distribution.  Give a
point estimate of $\mu$, other than the sample median $\tilde x$,
which has some protection against the presence of outliers in the
sample. Recall that we talked about such statistics in Chapter 4.

\item Suppose that the lead data is normal.  Then the true 95th
percentile of the lead distribution is $\mu + 1.645\sigma$ (i.e.
95\% of the lead measurements are less than this value).   Compute a
point estimate for this percentile.

\end{enumerate}



\item Do problem 9.14 on page 379.

\item Give one advantage and one disadvantage of using a 99\% confidence interval instead
of a 90\% confidence interval.


\item Read the March 2007 {\em Discover} article ``Scents and Scents-Ability"
available at the STAT401 web site.  The following questions pertain
to the first experiment in which thirty-two Berkeley undergrads
volunteered.
\begin{enumerate}
\item Give the individuals being measured.

\item Give the variable being measured, and give the sample space
of all possible outcomes.

\item Give a point estimate for the true proportion of all humans who
could ``sniff their way along a scent trail."

\item Is the sample size large enough to assume that the sample
proportion $p$ has an approximate normal distribution?  Why or why
not?  Be sure to mention which theorem assures that your answer is
correct.

\item If the true proportion of humans who can ``sniff
their way along a scent trail" is $\pi=.6$, then give the sampling
distribution of $p$, $\mu_p$ and $\sigma_p$.

\item Construct a 95\% CI for $\pi$.

\item Interpret the CI in terms of the problem.

\item To what population would it be reasonable to generalize the CI
estimate?

\item If the researchers want to cut the margin of error for the
95\% CI in half, how many sniffing humans should they observe?

\end{enumerate}




\item Read the January 2007 {\em Discover} article ``Power of Hallucinogenic
Mushrooms Revealed" available at the STAT401 web site.  Suppose that
your advisor is leery that psilocybin, the active ingredient in
hallucinogenic mushrooms, would cause a ``complete mystical
experience" in 60\% of all humans.  To estimate the true proportion
of humans who would experience a ``complete mystical experience"
after taking psilocybin within 5\% with 95\% confidence, give the
sample size required to take part in a new experiment here at MSU.

\item My friends Ben and Julia are moving to Clemson, South Carolina
in July.   They are looking for places to live that are not too far
from Clemson University where they are both to be employed.  They
find a house that they both like, but they disagree on how long
it'll take to get to the university.   Ben, ever the optimist,
guesses that it is a 20 minute commute.  Julia, the realist,
suggests that it is a 38 minute drive  (ahhhh, married life).

\begin{enumerate}
\item Assuming that these two guesses are a SRS, you will construct a 75\%
confidence interval for the true average time it'll take for the
commute.  Give the critical value that must be used to construct
this 75\% CI.  {\em Hint}: Use R's {\bf qt} function as outlined in
the Chapter 9 notes.


\item Assuming that these two guesses are a SRS, give a 75\%
confidence interval for the true average time it'll take for the
commute.


\item What (besides being a SRS) must we assume about the guesses so that the 75\% CI is
valid?

\item Why is the 75\% CI so wide?

\item In order to satisfy their curiosity (Ben and Jules are both
statisticians), the happy couple want to construct a 90\% confidence
interval for the true average commute time with a margin of error of
5 minutes.  How large of a SRS must be collected?
\end{enumerate}



\item In the Center for Biofilm Engineering on MSU's campus, the thin layer
(or slime) of bacteria that form on various surfaces, such as pipes
and catheters, are studied.  In experiments, microbiologists
estimate the density (per $mm^3$) of bacteria that have formed to
create a biofilm.  The data file ``bacteria.txt" can be found on the
STAT 401 website, where the bacteria densities are in millions per
$mm^3$.


\begin{enumerate}

\item We wish to construct a 95\% CI for $\mu$, the true mean density of bacteria in a biofilm.
What must be assumed about the data so that the CI is valid?  Why?


\item Does the evidence suggest that the data is not normal?   Use the techniques, including appropriate graphs and the correlation test from
Chapter 7 to answer this question.

\item Transform the data.  Use Box-Cox to determine the appropriate transform.  As in the Chapter 7 handout, the R-command

\begin{verbatim}
boxcox(density ~ 1,plotit=TRUE,lambda=seq(-1.5,1.5,.01))
\end{verbatim}

\noindent looks for the optimal $\lambda$ value between -1.5 and
1.5.  For this problem, you will need to use a larger range of
$\lambda$'s.  Include the Box-cox plot in your report and specify
which value of $\lambda$ are you using for the transform.

\item Use plots and the correlation test to make sure that your transform worked.

\item \label{transCI} Let $X$ denote the original, untransformed data and let $Y=X^\lambda$ be the transformed data.
Create a 95\% CI for $\mu_Y$


\item Let $X$ denote the original, untransformed data and let $Y=X^\lambda$ be the transformed data.
Create a 95\% CI for $\mu_X$, mean density of bacteria in the
biofilm.

\bigskip

Start with the CI for the transformed mean from \#\ref{transCI},
then back-transform the endpoints of the CI. For example, if you use
a lambda of $\frac{1}{2}$ (square root) as the optimal power to
transform the data to a normal distribution and the CI calculated
based on the transformed data is (2.05, 4.26), then you can
back-transform the end points by raising each end point to
$\frac{1}{\rm \lambda}$, or in this example, squaring each end
point, so the CI on the original scale is (4.20, 18.15).

\item Confirm that the back-transformed CI for $\mu_X$ is appropriate.   {\em Hint}: Estimate
$\frac{\sigma_x^2}{\mu_x^2}$ where $\mu_x$ and $\sigma_x$ are
parameters of the original lead measurements $X$.

\item Interpret the 95\% CI in the context of this problem.

\end{enumerate}

\end{enumerate}
\end{document}

