Randomization Tests in R

Sampling distributions describe the behavior of a sample statistic across many many samples from the population or random process. If we had the entire population, we could generate sampling distributions through simulation as in the introduction to sampling distributions. However, in practice, we don't have the population (or else why would we conduct inference!). Thus, we need to use mathematical probability results for sampling distributions, or use a chance model to simulate a sampling distribution assuming some null hypothesis.

Though online applets, such as these, can perform randomization and simulation-based hypothesis tests for us, we would like to program our own randomization/simulation-based tests using R. This will also give us a deeper understanding of randomization/simulation-based inference methods.

Consider the following case study from class:

Swimming with dolphins can certainly be fun, but is it also therapeutic for patients suffering from clinical depression? To investigate this possibility, researchers recruited 30 subjects aged 18-65 with a clinical diagnosis of mild to moderate depression. Subjects were required to discontinue use of any antidepressant drugs or psychotherapy four weeks prior to the experiment, and throughout the experiment. These 30 subjects went to an island off the coast of Honduras, where they were randomly assigned to one of two treatment groups. Both groups engaged in the same amount of swimming and snorkeling each day, but one group (the animal care program) did so in the presence of bottlenose dolphins and the other group (outdoor nature program) did not. At the end of two weeks, each subjects’ level of depression was evaluated, as it had been at the beginning of the study, and it was determined whether they showed substantial improvement (reducing their level of depression) by the end of the study  (Antonioli and Reveley, 2005).

Observed data: The researchers found that 10 of 15 subjects in the dolphin therapy group showed substantial improvement, compared to 3 of 15 subjects in the control group.

The null hypothesis in this study is that swimming with dolphins does not improve depression symptoms in the population of people aged 18-65 with a clinical diagnosis of mild to moderate depression. Under this assumption, the 10+3 =13 subjects who showed improvement would have improved under either condition, and the other 17 subjects who did not improve would have done so regardless of the treatment. Thus, let's simulate what would happen if we randomly assigned 13 "improvers" and 17 "non-improvers" to two groups of 15:

subjects <- c(rep("improver",13), rep("nonimprover",17))
dolphin <- sample(subjects, 15, replace = FALSE)
# The number of "improvers" that ended up in the dolphin group due to random chance:
sum(dolphin == "improver")

Now, let's use a for loop to repeat this simulation many times and keep track of the number of "improvers" that ended up in the dolphin group each time:

counts <- NULL
for(i in 1:10000){
dolphin <- sample(subjects, 15, replace = FALSE)
counts[i] <- sum(dolphin == "improver")
hist(counts, breaks=(0:15+.5), col="turquoise",
xlab="Number of improvers in dolphin group",
main="Distribution of number of improvers in dolphin group")


  1. Use the simulated counts to calculate a p-value, i.e., the proportion of the time we would see data like our observed data (10 of 15 improving in the dolphin group) or more extreme (more than 10 improving in the dolphin group), assuming that dolphins have no effect on depression symptoms. How does this p-value help you address the research question? What is your conclusion?
  2. Perform the same simulation, but instead of tracking the number of improvers in the dolphin group as your statistic, track the difference in proportion of improvers between the two groups. Do you get the same p-value?
  3. Try coding a simulation test for one of the other two cases in the case study handout.