R Functions for Inference

In the last lab, you were introduced to two R functions for inference with proportions:


The first function conducts exact inference using a binomial distribution, whereas the second function uses the normal approximation to conduct inference. Since exact inference can only be conducted in the single proportion case, if we are conducting inference on a difference of proportions, we either need to conduct a randomization test (see Lab 5), or use the prop.test function to use the normal approximation.

Inference for categorical data

We are going to work through the Inference for categorical data lab by Andrew Bray. But first, let's learn more about the syntax for the prop.test function when testing a difference in proportions. (For the syntax when testing a single proportion, see Lab 6.)

The prop.test function uses summary data - it doesn't take the raw data table where each row in the table corresponds to one observational unit and their categorical response. Instead, it needs either the number of successes and the sample size, or a summary table of the data. Let's practice using data from the dolphin case study (described here).

dolphins <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/dolphins.txt", header=TRUE)

Take a look at the raw data - notice that none of the values in the data set are numbers; we are measuring two categorical variables on each individual.


The research hypothesis was that swimming with dolphins would improve the rate of improvement in depression symptoms, i.e., the probability of improvement in the dolphin group would be higher than that in the control group. To use the prop.test function, we first need to summarize the data into a two-way table.


Then, we enter a vector of the number of successes in each group as the first argument to prop.test, and a vector of the number of observations in each group as the second. Make sure to be consistent with the ordering of the two vectors: c(number of successes in group 1, number of successes in group 2), c(number of observations in group 1, number of observations in group 2).

prop.test( c(10,3), c(15,15), alternative = "greater", conf.level = 0.95)

Check that the sample estimates in the output match the two sample proportion of successes from the two groups. Note that by default, it performs the test using what is called a "continuity correction", which adds 0.5 to each cell of the table. For smaller samples, this makes the normal approximation to the difference in sample proportions slightly more accurate. However, for this data set, are our conditions for using the normal approximation met? (No! Why not? What inference method should we be using instead?)

One can also give the table directly to prop.test, but it will treat the rows as the two groups and the first column as the "success" category. Thus, the following command tests if the proportion who did not improve is greater in the dolphin group than in the control group, which is the incorrect research hypothesis:

prop.test( table(dolphins), alternative = "greater", conf.level = 0.95)

Work through Andrew Bray's Inference for categorical data lab, but instead of using his inference function (which is not a built-in R function - it loads with the .Rdata file for the lab), practice using the prop.test function. If you are unable to load the data set from his lab page, use the following R command instead:

atheism <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/atheism.txt", header=TRUE)