R Functions for Inference

We will be using the t.test function to conduct inference (hypothesis tests and confidence intervals) for three scenarios with a quantitative response:

  1. single mean
  2. paired mean difference
  3. difference in two means

Look up the help file for this function to get a sense of the input arguments:


For the following sections, let's use the anorexia data set from lecture. Read this data set into your R session:

Anorexia <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/Anorexia.txt", header=TRUE)

Single Mean

The primary diagnosis for anorexia is a weight that is less than 85% of what is considered normal for that person's height and age. In this population of patients, researchers consider a patient anorexic if his or her weight is lower than 90 lbs. Is there evidence that this sample of patients is anorexic prior to treatment? Let's use the t.test to answer this question. 

t.test(Anorexia$Before, alternative = "less", mu = 90, conf.level = 0.95)

The first argument to the t.test function is the vector of responses. We then specify the direction of the alternative ("less", "greater", or "two.sided" (default)), the null value, and a confidence level. (The t.test function output gives both a hypothesis test and a confidence interval for the parameter). The sample mean weight before treatment was 82.44 lbs, and we have strong evidence the true mean weight of the population from where we took this sample from is less than 90 lbs.

Note that with a one-sided alternative, R returns a "one-sided" confidence interval. If we want a two-sided confidence interval, we need to specify a two-sided alternative:

t.test(Anorexia$Before, alternative = "two.sided", mu = 90, conf.level = 0.95)

Paired Mean Difference

Let's consider only patients on the Family treatment.

Fam <- Anorexia[ Anorexia$Therapy == "Family", ]

Was the average weight gain in this group large enough to provide evidence that the Family treatment worked? We can use the t.test in two ways to answer this question. First, we could calculate the difference in weight (After - Before) for each patient, and then use the t.test function as if it's a single mean scenario with a zero null value.

t.test(Fam$Y, alternative="greater", mu = 0, conf.level = 0.95)

Or, we gave give the t.test the before weights and the after weights and tell R that the data are paired.

t.test(Fam$After, Fam$Before, alternative="greater", mu = 0, conf.level = 0.95, paired = TRUE)

The default null value is zero, and the default confidence level is 0.95, so we could have run the above code without those arguments.

t.test(Fam$After, Fam$Before, alternative="greater", paired = TRUE)

Difference in Means

To compare two means from independent groups, we give the t.test function the two vectors of responses as the first and second argument. We could specify "paired = FALSE" as an argument, but this is the default, so it is unnecessary. Here is the R code to calculate a hypothesis test of H0: "The true mean weight gain is equal between the Family and Control treatments" versus Ha: "The true mean weight gain is greater on the Family treatment than the Control". R will also output a confidence interval for the true difference in means (first argument - second argument).

Gains.Fam <- Anorexia$Y[ Anorexia$Therapy == "Family" ]
Gains.Cont <- Anorexia$Y[ Anorexia$Therapy == "Control" ]
t.test(Gains.Fam, Gains.Cont, alternative="greater")

Inference for quantitative data

We are going to work through the Inference for quantitative data lab by Andrew Bray, but instead of using his inference function (which is not a built-in R function - it loads with the .Rdata file for the lab), practice using the t.test function. If you are unable to load the data set from his lab page, use the following R command instead:

nc <- read.table("http://www.math.montana.edu/shancock/courses/stat401/data/nc.txt", header=TRUE)