R Functions for Inference

We have been introduced to three built-in R functions that will conduct normal-based hypothesis tests and confidence intervals for particular scenarios:

R Function Scenario(s) Conditions
prop.test
  • One-sample z-test: Binary variable
  • Two-sample z-test: Binary response variable
    and binary explanatory variable
At least 10 "successes" and 10 "failures" in each group
chisq.test
  • Chi-squared goodness-of-fit test: Categorical variable
  • Chi-squared test of independence: Two categorical variables 
At least 5 in each cell of the table
t.test
  • One-sample t-test: Quantitative variable
  • Paired t-test: Difference in two quantitative variables
    among pairs 
  • Two-sample t-test: Quantitative response variable
    and binary explanatory variable
Either the quantitative response variable(s) (or differences)
need to be approximately normal, or your sample size(s)
should be at least 30 (larger if the data are heavily skewed).


We practiced using prop.test in Lab 4. In this lab, will be practice using the chisq.test and t.test functions. Look up their help files to get a sense of the inputs.

?chisq.test
?t.test

 


 

Chi-squared Tests

Goodness of fit test for a single categorical variable

In this section, we are going to practice summarizing, visualizing, and analyzing categorical data using the chisq.test function. We will start with one-way tables and move to two-way tables:

Are the tulip colors in a field equally distributed? Consider the tulip data below from a random sample of tulips in the field:

Color Count
Red 123
White 102
Yellow 81

 Let's summarize and visualize the data.

counts <- c(123, 102, 81)
n <- sum(counts)
# Sample proportions:
counts/n
barplot(counts/n, names.arg = c("Red","White","Yellow"), ylab="Proportion",
main = "Observed Distribution of Tulip Colors")

It seems that Red is most common, occurring 40% of the time, and yellow least common at 26%. Does this mean tulips in the entire field are not equally distributed? Or did this just happen by chance. Let's test the hypotheses

H0pred = 1/3, pwhite = 1/3, pyellow = 1/3 

Ha: At least two probabilities differ.

First, do the test calculations "by hand":

# Expected counts
expected <- n*c(1/3, 1/3, 1/3)
expected
# Chi-squared test statistic
stat <- sum( (counts - expected)^2/expected )
stat
# p-value
pchisq(stat, df = 2, lower.tail=FALSE)

With a p-value of 0.0132, there is fairly strong evidence that the tulip colors are not equally distributed (do not occur in the same proportions) across the entire field. Specifically how do the observed counts differ from what we'd expect to see under the null?

residuals <- (counts - expected)/sqrt(expected)
residuals

We can also plot the expected counts versus the observed counts in a bar plot.

barplot(rbind(counts, expected), names.arg = c("Red","White","Yellow"), beside=TRUE,
ylab="Count", legend.text=c("Observed","Expected"), col=c(3,5),
main="Observed versus Expected Counts")

There were more Red tulips than we would expect, and less White tulips than we would expect, if the colors were equally distributed. Note that the squared residuals add up to the chi-squared test statistic.

Note: The R function "rbind" stands for "row bind". It creates a matrix where each row is one argument in the function. When you give the "barplot" function a matrix, it plots the columns of the matrix for each category.

Exercise

Based on the tulip data above, perform a goodness-of-fit test to determine if the distribution  pred = .40, pwhite = .35, pyellow = .25  is a reasonable fit to the data.

We can use the chisq.test function to check our calculations above:

chisq.test(counts, p=c(1/3,1/3,1/3), correct=FALSE)
chisq.test(counts, p=c(.4,.35,.25), correct=FALSE)

Chi-squared tests of independence for two categorical variables

Returning to the tulip example, suppose we wanted to know whether the distribution of tulip colors differed across three fields, A, B, and C. We already looked at data from field A. Here is a summary of the entire data set:

  Field
Color A B C
Red 123 240 100
White 102 200 35
Yellow 81 163 47

 Let's first visualize the data.

red <- c(123,240,100)
white <- c(102,200,35)
yellow <- c(81,163,47)
tulips <- rbind(red, white, yellow)
tulips
barplot(tulips, names.arg=c("A","B","C"), col=c("red","white","yellow"),
ylab="Frequency", xlab="Field", beside=TRUE,
main="Number of tulips by color and field")

Note that a barplot of counts is not really a fair comparison since the sample sizes differ across fields. Instead, we'd like to look at the conditional proportions within fields:

# Number of tulips collected in each field
colSums(tulips)
tulips.prop <- rbind(red/colSums(tulips), white/colSums(tulips), yellow/colSums(tulips))
barplot(tulips.prop, names.arg=c("A","B","C"), col=c("red","white","yellow"),
ylab="Conditional Proportions", xlab="Field", beside=FALSE,
main="Distribution of Colors by Field")

The tulip color distribution seems very similar between fields A and B, but C appears to have a larger number of red tulips. Is this due to chance, or do the distributions truly differ?

H0: Field and tulip color are independent.

Ha: Field and tulip color and dependent.

The chisq.test function will do our test calculations for us. 

chisq.test(tulips, correct=FALSE)
chisq.test(tulips, correct=FALSE)$observed
chisq.test(tulips, correct=FALSE)$expected

These data provide evidence to suggest that the distribution of tulip color is dependent on field.

Exercise

Calculate the expected counts, the chi-squared test statistic, and the p-value for the previous example without using the chisq.test function.

 


 t-Tests

For the following sections, let's use the anorexia data set from lecture. Read this data set into your R session:

Anorexia <- read.table("http://www.math.montana.edu/shancock/data/Anorexia.txt", header=TRUE)

Single Mean

The primary diagnosis for anorexia is a weight that is less than 85% of what is considered normal for that person's height and age. In this population of patients, researchers consider a patient anorexic if his or her weight is lower than 90 lbs. Is there evidence that this sample of patients is anorexic prior to treatment? Let's use the t.test to answer this question. 

t.test(Anorexia$Before, alternative = "less", mu = 90, conf.level = 0.95)

The first argument to the t.test function is the vector of responses. We then specify the direction of the alternative ("less", "greater", or "two.sided" (default)), the null value, and a confidence level. (The t.test function output gives both a hypothesis test and a confidence interval for the parameter). The sample mean weight before treatment was 82.44 lbs, and we have strong evidence the true mean weight of the population from where we took this sample from is less than 90 lbs.

Note that with a one-sided alternative, R returns a "one-sided" confidence interval. If we want a two-sided confidence interval, we need to specify a two-sided alternative:

t.test(Anorexia$Before, alternative = "two.sided", mu = 90, conf.level = 0.95)

Paired Mean Difference

Let's consider only patients on the Family treatment.

Fam <- Anorexia[ Anorexia$Therapy == "Family", ]
Fam

Was the average weight gain in this group large enough to provide evidence that the Family treatment worked? We can use the t.test in two ways to answer this question. First, we could calculate the difference in weight (After - Before) for each patient, and then use the t.test function as if it's a single mean scenario with a zero null value.

t.test(Fam$Y, alternative="greater", mu = 0, conf.level = 0.95)

Or, we gave give the t.test the before weights and the after weights and tell R that the data are paired.

t.test(Fam$After, Fam$Before, alternative="greater", mu = 0, conf.level = 0.95, paired = TRUE)

The default null value is zero, and the default confidence level is 0.95, so we could have run the above code without those arguments.

t.test(Fam$After, Fam$Before, alternative="greater", paired = TRUE)

Difference in Means

To compare two means from independent groups, we give the t.test function the two vectors of responses as the first and second argument. We could specify "paired = FALSE" as an argument, but this is the default, so it is unnecessary. Here is the R code to calculate a hypothesis test of H0: "The true mean weight gain is equal between the Family and Control treatments" versus Ha: "The true mean weight gain is greater on the Family treatment than the Control". R will also output a confidence interval for the true difference in means (first argument - second argument).

Gains.Fam <- Anorexia$Y[ Anorexia$Therapy == "Family" ]
Gains.Cont <- Anorexia$Y[ Anorexia$Therapy == "Control" ]
t.test(Gains.Fam, Gains.Cont, alternative="greater")

 


Inference for quantitative data

For extra practice, work through the Inference for quantitative data lab by Andrew Bray, but instead of using his inference function (which is not a built-in R function - it loads with the .Rdata file for the lab), practice using the t.test function. If you are unable to load the data set from his lab page, use the following R command instead:

nc <- read.table("http://www.math.montana.edu/shancock/data/nc.txt", header=TRUE)