# Lab 7 - Inference for Proportions in R

## R Functions for Inference

In the last lab, you were introduced to two R functions for inference with proportions:

binom.test

prop.test

The first function conducts exact inference using a binomial distribution, whereas
the second function uses the normal approximation to conduct inference. Since exact
inference can only be conducted in the single proportion case, if we are conducting
inference on a difference of proportions, we either need to conduct a randomization
test (see Lab 5), or use the *prop.test* function to use the normal approximation.

## Inference for categorical data

We are going to work through the Inference for categorical data lab by Andrew Bray. But first, let's learn more about the syntax for the *prop.test* function when testing a difference in proportions. (For the syntax when testing a
single proportion, see Lab 6.)

The *prop.test *function uses summary data - it doesn't take the raw data table where each row in
the table corresponds to one observational unit and their categorical response. Instead,
it needs either the number of successes and the sample size, or a summary table of
the data. Let's practice using data from the dolphin case study (described here).

dolphins <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/dolphins.txt", header=TRUE)

Take a look at the raw data - notice that none of the values in the data set are numbers; we are measuring two categorical variables on each individual.

dolphins

The research hypothesis was that swimming with dolphins would improve the rate of
improvement in depression symptoms, i.e., the probability of improvement in the dolphin
group would be higher than that in the control group. To use the *prop.test *function, we first need to summarize the data into a two-way table.

table(dolphins)

Then, we enter a vector of the number of successes in each group as the first argument
to *prop.test*, and a vector of the number of observations in each group as the second. Make sure
to be consistent with the ordering of the two vectors: c(number of successes in group
1, number of successes in group 2), c(number of observations in group 1, number of
observations in group 2).

prop.test( c(10,3), c(15,15), alternative = "greater", conf.level = 0.95)

Check that the sample estimates in the output match the two sample proportion of successes from the two groups. Note that by default, it performs the test using what is called a "continuity correction", which adds 0.5 to each cell of the table. For smaller samples, this makes the normal approximation to the difference in sample proportions slightly more accurate. However, for this data set, are our conditions for using the normal approximation met? (No! Why not? What inference method should we be using instead?)

One can also give the table directly to *prop.test*, but it will treat the rows as the two groups and the first column as the "success"
category. Thus, the following command tests if the proportion who did *not* improve is greater in the dolphin group than in the control group, which is the incorrect
research hypothesis:

prop.test( table(dolphins), alternative = "greater", conf.level = 0.95)

Work through Andrew Bray's Inference for categorical data lab, but instead of using his *inference *function* *(which is not a built-in R function - it loads with the .Rdata file for the lab),
practice using the *prop.test* function. If you are unable to load the data set from his lab page, use the following
R command instead:

atheism <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/atheism.txt", header=TRUE)