\documentclass[11pt,titlepage]{article}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{verbatim}
%\allowdisplaybreaks

%\pagestyle{plain}

\jot=.2in \pagestyle{plain} \setlength{\topmargin}{-.5in}
\setlength{\textheight}{9 in} \setlength{\oddsidemargin}{-0.2in}
\setlength{\evensidemargin}{-0.2in} \setlength{\textwidth}{6.75in}
\font\heada=cmbx10 scaled\magstep3 \font\headb=cmsl10
scaled\magstep1 \font\headc=cmr8 \pretolerance=10000 %\raggedright
\setlength{\parindent}{2 em}
%\input macros
\newdimen\digitwidth
\newdimen\minuswidth
\setbox0=\hbox{\rm0}
\digitwidth=\wd0
\setbox1=\hbox{$-$}
\minuswidth=\wd1
\newdimen\starr
\setbox2=\hbox{${}^*$}
\starr=\wd2

{\catcode`?=\active
\def?{\kern\digitwidth}
\catcode`@=\active
\def@{\kern\minuswidth}
\catcode`|=\active
\def|{\kern\starr}}


\begin{document}
\noindent {\heada Chapter 1 - Data}\\
\noindent Read sections 1.1 - 1.3

\vspace{0.1in}

\noindent \underline{\bf Statistics} consist of three major areas:
\begin{itemize}
\item Data Collection (sampling plans and experimental designs) \vspace{-0.1in}
\item Descriptive Statistics (numerical and graphical summaries) \vspace{-0.1in}
\item Inferential Statistics (confidence intervals and hypothesis testing)
\end{itemize}
\vspace{0.05in}

\noindent Statistical procedures are part of the
\underline{\bf Scientific Method} (steps 2-5 below) first espoused by Sir Francis
Bacon (1561-1626), who wrote ``to learn the secrets of nature
involves collecting data and carrying out experiments."  The
modern methodology:\vspace{-0.1in}
\begin{enumerate}
\item Observe some phenomenon \vspace{-0.1in}

\item State a hypothesis explaining the phenomenon\vspace{-0.1in}

\item Collect data\vspace{-0.1in}

\item Analyze the data and Test: Do the data support the hypothesis?\vspace{-0.1in}

\item Conclusion.  If the test fails, go back to step 2.
\end{enumerate}

\noindent If you encounter a ``scientific claim" that you disagree
with, scrutinize the steps of the scientific method used.
``Statistics don't lie, but liars do statistics." - Mark Twain.

\vspace{0.1in}

\noindent \underline{\bf Individuals} or \underline{\bf Cases} or \underline{\bf Units}: The objects from which data is collected.
Individuals may be people, places, animals, things, or time periods.
\vspace{0.1in}

\noindent \underline{\bf Variable}: Any characteristic of an
individual that can be measured. \vspace{0.1in}

\noindent \underline{\bf Two Types of Variables}:
\begin{itemize}
\item {\bf Categorical} or {\bf Qualitative} - The possible values are
{\it categories} or {\it levels}.  Beware, some category names are actually
numbers (e.g. zip codes and dates) \vspace{-.1in}

\item {\bf Numerical} or {\bf Quantitative} - The possible values are
{\it numbers} so that mathematical operations, such as averaging,
make sense!
\end{itemize}


\begin{verse}
\noindent {\bf \underline{QUESTION}: Categorical or Numerical?}
\begin{enumerate}
\item Lifetime of a battery:
\item Type of battery:
\item Distance to school:
\item UPC code on a box of cereal:
\end{enumerate}
\end{verse}
\vspace{0.05in}

\noindent \underline{\bf Two Types of Numerical Variables}:
\begin{itemize}
\item {\bf Discrete} - The possible values are isolated points on
the number line.  Discrete variables can be either:
\vspace{-0.1in}
\begin{itemize}
\item  {\bf finite} (e.g. the number of beers left in a six pack:
0, 1, 2, 3, 4, 5 or 6)\vspace{-0.1in}

\item {\bf infinite} (e.g. the number of (full) minutes until the
next terrorist attack: 0, 1, 2, 3, $\hdots$ ,  $\infty$).
\end{itemize}
\item {\bf Continuous} - The possible values are an interval on
the number line (e.g. the distance between any two students in
this classroom (in feet) is in the interval [0,50) - all real
numbers between 0 and 50, including 0 and excluding 50).
\end{itemize}

\begin{verse}
\noindent {\bf \underline{QUESTION}: Discrete or Continuous?}
\begin{enumerate}
\item Amount of money on you:
\item Your height:
\item Reaction time:
\item Number of children you have:
\end{enumerate}
\end{verse}
\vspace{0.05in}

\noindent \underline{\bf Population}: The entire group of
individuals that we want information about. For example: all
grizzly bears in Yellowstone National Park; all G.E. light bulbs
(made now and in the future); all tosses with a weighted die

\vspace{0.05in}

\noindent \underline{\bf Sample}: A part of the population from
which data is collected.  For example: 22 tagged grizzly bears in
Yellowstone National Park; 1 box of G.E. light bulbs; 100 tosses with
a weighted die. \vspace{0.05in}

\noindent Typically, it is unrealistic to obtain a {\bf census} (i.e., data from the
entire population of interest).  So one collects data from a sample
and uses the sample results to draw conclusions about the
population. This process is called \underline{\bf Inference}.
\vspace{0.05in}

\noindent \underline{\bf Explanatory Variable vs. Response Variable}: One or more variables
({\bf explanatory variables}) are used to predict or explain the values of another variable
({\bf response variable}).


\section*{Obtaining and Installing R}

\begin{enumerate}
\item Visit http://cran.r-project.org. This is the website for The Comprehensive R Archive Network (CRAN) from which you can download R and R packages. 

\item The first box on this page is labeled {\em Download and Install
R}.  In that box, click on the appropriate link.  For example, MAC
users will click on {\em Download R for (Mac) OS X} and Microsoft Windows users will
click on the link {\em Download R for Windows}.  The rest of these
instructions are specific to Windows users.

\item On the new page, click on the link named {\em base}.

\item On the new page, the link {\em README.R-2.4.1} provides a brief synopsis on
installation and other instructions for R version 2.4.1 for Windows.
You shouldn't need to look at this file, but take a look if you get
into trouble.

\item Click on the link {\em Download R 3.3.1 for Windows} to download the executable file R-3.3.1-win.exe to the hard drive on your computer.

\item Exit from your Internet Browser.  Open Windows Explorer.  Go to the
folder in which you saved R-3.3.1-win.exe and run the program.

\item You will be guided through the installation by a Setup
Wizard.

\item There are many excellent resources for using R. One
interactive site is at
http://www.math.csi.cuny.edu/Statistics/R/simpleR,
called ``Simple R" by John Veranzi.

\item Special-purpose software routines are bundled as separate
``packages." Some packages are automatically downloaded when R
is downloaded. To download additional packages, execute R on your PC
and then click on the tab {\em Packages} from one of the tabs at the
top of the screen.  From the drop down menu, click on {\em Install
package(s) ...} and then choose the package(s) that you want to
download. The packages that you will need to download for this
course are the following:

\begin{itemize}
\item lattice

\item pastecs

\end{itemize}
MASS is another package which we will be using which you do NOT need
to download because it is a part of ``base R."
\end{enumerate}

\section*{Entering Data into R}

\noindent {\it Lactococcus lactis} and {\it Leuconostoc citrovorum} are two common bacteria used for making cotttage cheese.  While developing a new type of  cottage cheese, a large dairy producer has added the fungus {\it Penicillium candidiuma} (PC), typically used to make Brie,  to their cottage cheese recipe.  One part of the process used to make cottage cheese involves cooking curdled milk for about an hour.  A researcher is interested in determining whether adding PC increases the cooking time. Seven dairiy facilities (referred to as A, B, ..., G) make two batches of
cottage cheese, one with and one without the fungus PC.  The cooking time in minutes was recorded for each batch. The
results of the experiment are in a text file called ``dairy.txt"
which is shown below:

\verbatiminput{data/dairy.txt}

\noindent Text data files that are tab or space delimited can be
imported into R.  This means that the names of the variables in
the file cannot have spaces in them (e.g. don't use ``Cook Time"). To get dairy.txt into R, execute the following
command:

\begin{verbatim}
> D = read.table("dairy.txt",header=TRUE)
\end{verbatim}

\noindent {\bf read.table} is a {\em function}, and the {\em
parameter} {\bf header=TRUE} tells R that the first line of the
file contains the variable names of each of the columns of data.
You could end up with an error like:

\begin{verbatim}
Error in file(file, "r") : unable to open connection
In addition:
Warning message: cannot open file `dairy.txt'
\end{verbatim}

\noindent The above error occurred because dairy.txt was not in
the {\bf working directory}.  To change the working directory to
the one where dairy.txt resides, in R, click on tab {\bf File
$\to$ (Change dir ...)} and you will see a {\bf Choose Directory}
window appear. In this window, you can directly enter the
directory that contains diary.txt on your computer, or you can hit
the Browse button to find the directory. Once you find the
directory that contains dairy.txt, then (click OK in the Browser
Window if you hit the Browse button and then ...) click OK in the
{\bf Choose Directory} window. Now we can try to read the data
into R again.

\begin{verbatim}
> D = read.table("dairy.txt",header=TRUE)
\end{verbatim}

\noindent The R-variable {\bf D} that contains the data is called
a {\bf data frame}.  We could have used any variable name like
``DairyData" ``CCheese", but I don't like to type much, so I used
``D".  Note that you can not have spaces in your R-variable names!
Type the variable name at the R prompt to see what the data looks
like:

\verbatiminput{Rout1.txt}

\noindent To access the individual columns of the data in D, type

\verbatiminput{Rout2.txt}

\noindent Or you can execute

\verbatiminput{Rout3.txt}

\noindent R is case-sensitive!  The upper and lower-case letters
in the variable name must be EXACTLY as given in the data file or
R will not find it.  For example,

\begin{verbatim}
> TIME
Error: object "TIME" not found
> D$time
NULL
\end{verbatim}

\noindent Notice that R recognizes that {\bf Dairy} and {\bf
Treatment} are categorical variables and gives the {\em levels} or
categories associated with each.  The variable {\bf Time} is
recognized as a quantitative variable.

\vspace{.1in}

\noindent In addition to {\bf read.table}, we will be using many
other functions that R has available.  For example, {\bf mean()}
calculates the mean and {\bf median()} calculates the median.  The
functions {\bf sd()} and {\bf var()} calculate the standard
deviation and variance respectively.  For example:

\begin{verbatim}
> mean(Time)
[1] 63.5
> median(Time)
[1] 66
> sd(Time)
[1] 12.91243
> sd(Dairy)
Error in var(as.vector(x), na.rm = na.rm) :
        missing observations in cov/cor
In addition: Warning message: NAs introduced by coercion
\end{verbatim}

\noindent The command {\bf sd(Dairy)} yields an error because {\bf
Dairy} is a categorical variable.

\vspace{.1in}

\noindent Oftentimes, it is a good idea to store a result in an
R-variable so that you can refer to it later. Then you can type
the new variable name to see what is stored in it.  For example,

\begin{verbatim}
> Time.mean = mean(Time)
> Time.mean
[1] 63.5
> Time.mean/10 +100
[1] 106.35
\end{verbatim}

\noindent The last command shows that R-variables can be used with
the mathematical operators +, -, * and /.  To compute the mean and
standard deviation of the cook times of cottage cheese with PC and without PC, execute

\begin{verbatim}
> tapply(Time,Treatment,mean)
withoutPC    withPC
 61.14286  65.85714
> tapply(Time,Treatment,sd)
withoutPC    withPC
 12.62839  13.74080
\end{verbatim}

\noindent Does this suggest that adding the fungus PC increases the
cook time of cottage cheese?


\section*{Exercises}

Starting on page 56, do problems: 1.3, 1.5, 1.7


\end{document}