R Functions for Inference

In Lab 1, we learned that the form of an R function is:

function.name( arg.name=value, ... )

R has many built-in functions, including ones that conduct statistical inference (hypothesis tests and confidence intervals) on a given data set. In this lab, we'll explore two R functions for inference:

binom.test
prop.test

The first function conducts exact inference using a binomial distribution, whereas the second function uses the normal approximation to conduct inference. To see the arguments to these functions, look at their help files.

?binom.test
?prop.test

Binomial Inference: Hypothesis Testing

Charles Darwin did an experiment comparing the heights of cross- and self-pollinated plants (Zea Mays, or corn, to be precise; see "The effect of cross- and self-fertilization in the plant kingdom", 1876). Enter the data into your R session by copying and pasting the following:

Darwin <- structure(
      list(Pot = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4),
       Crossed = c(23.5, 12, 21, 22, 19.125, 21.5, 22.125, 20.375, 
                   18.25, 21.625, 23.25, 21, 22.125, 23, 12), 
          Self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, 
                   15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18)), 
         .Names = c("Pot", "Crossed", "Self"), 
          class = "data.frame", 
       row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
                      "11", "12", "13", "14", "15"))

Remember - to look at an object just enter its name: Darwin. The resulting data table should look like this:

       Pot Crossed   Self 
     1   1  23.500 17.375
     2   1  12.000 20.375
     3   1  21.000 20.000
     4   2  22.000 20.000
     5   2  19.125 18.375
     6   2  21.500 18.625
     7   3  22.125 18.625
     8   3  20.375 15.250
     9   3  18.250 16.500
    10   3  21.625 18.000
    11   3  23.250 16.250
    12   4  21.000 18.000
    13   4  22.125 12.750
    14   4  23.000 15.500
    15   4  12.000 18.000

Darwin was interested in demonstrating that cross-pollinated plants were healthier than self-pollinated. The standard approach to questions like this one is to frame it as a hypothetical: if there were really no difference between the two groups, what would we expect to see?

To translate this into a question about the binomial distribution, start by converting the pairs of values above to Bernoulli trials. Call it a success (1) if the crossed plant is taller than the self-pollinated plant with which it is paired, and a failure (0) otherwise.

attach(Darwin)
X <- sum(Crossed > Self)

You should have 13 successes out of the 15 trials. Our hypothetical scenario, known as the null hypothesis, corresponds to the claim that the population proportion of 1's is 1/2. In other words, the cross-pollinated and self-pollinated plants are equally likely to be the taller of the two.

Assuming independence, we have a set of 15 Bernoulli trials, of which 13 were successes.

p-values

The p-value is an attempt to directly measure the improbability of an observed result, under the model of the null hypothesis. With Darwin's data, we could compute the probability of seeing exactly 13 successes in 15 trials if self-pollinated and cross-pollinated plants are equally likely to be taller:

dbinom(13,15,1/2)

However for a large number of trials, even the most likely outcome can be quite improbable. For example, try computing the probability of tossing 5000 heads in 10000 trials for a fair coin.

What we really want here is the probability of a result at least as unlikely as the one we observed. For a symmetric distribution, this is the set of outcomes at least as far from the expected count as the one we got. The expected value for n=15 and p=1/2 is 7.5. Thus we want the probability of seeing a result at least as far from 7.5 as 13 is. That includes 13, 14, and 15 in one direction, and 2, 1, and 0 in the other direction. Note: In general a result "at least as unlikely as the one we observed" should be defined by its probability under H0, not by distance from the expected value. In a symmetric distribution these are the same, but in a skewed distribution, they may be different! Can you see how to compute the p-value directly using the definition? For our example with n=15, p=1/2, and X=13, try

p <- dbinom(0:15,15,1/2)
sum(p[p <= dbinom(13,15,1/2)])

Here is a plot of the distribution with a line showing the probability of our observed result:

barplot(dbinom(0:15,15,1/2), names.arg=c(0:15), space=0)
abline(h = dbinom(13,15,1/2))  

Note: abline(b,m) adds a line with intercept b and slope m.

The desired calculation is easy to perform in R, just add up the binomial probabilities corresponding to those outcomes:

sum(dbinom(c(0:2,13:15),15,1/2))

Alternatively, we could use the pbinom function for this computation:

pbinom(2,15,1/2) + (1 - pbinom(12,15,1/2))

Either way, you should get the same answer: 0.00738..., which is less than a 1 percent chance. This quantity is called a p-value, which is short-short-hand for `the probability of seeing a result at least as unlikely as the one observed, if the null hypothesis were really true'.

WARNING: the p-value is NOT `the probability that the null hypothesis is correct'. That is not, in general computable without more information, specifically an initial probability (prior) for the truth of the null hypothesis, to be updated by Bayes rule.

Rejection regions and significance level

Rather than calculating a p-value, the classical hypothesis test starts with construction of a rejection region - a set of possible values of the statistic for which we would reject the null hypothesis. The size of the rejection region is determined by the significance level, the probability of getting an outcome in the rejection region, computed under the null hypothesis.

Suppose we wish to choose a rejection region of size no bigger than alpha=0.05. To see which values to include, take a look at the binomial probabilities, rounded off to 5 significant digits for ease of reading:

round(dbinom(0:15,15,1/2),5)

Let's try using 0:3 and 12:15 for our rejection region: R={0,1,2,3,12,13,14,15}. To compute the effective significance level for this rejection region, create a dataset containing the values in the rejection region, and then sum the dbinom values for those outcomes:

R <- c(0:3,12:15)
sum(dbinom(R,15,1/2))

The significance level for this rejection region is 0.035. You should verify that if we make the rejection region any larger, then alpha will be bigger than 0.05.

Now, we look at the observed value, 13, and note that it is in the rejection region. The conclusion: we reject the null hypothesis p=1/2 at the 0.035 significance level. Either p is not 1/2 or we were unlucky.

Power

It is easy to compute the power (probability of rejecting the null hypothesis, given an alternative value for p). We just need to sum the binomial probabilities for the rejection region, using the alternative value of p. Suppose that we were interested in the alternative "p=2/3":

sum(dbinom(R,15,2/3))

It is often useful to compute the power for many values of "p" at once, and plot against "p". Here is a function that will do that for a given rejection region RR or a given significance level alpha:

bin.pwr <- function(n,p0=1/2,RR=NULL,alpha=.05,COL="red")
{
# plot power vs p for H0: p=p0, default p0=1/2
# RR if present is the chosen rejection region
# alpha is the target significance level

if(is.null(RR)){
    X = 0:n
    db = dbinom(X,n,p0)
    sdb = sort(db)
    cb = cumsum(sdb)
    m = max((1:(n+1))[cb <= alpha])
    R = X[db <= sdb[m]]
}
else{
    R=RR
}
Alpha <- sum(dbinom(R,n,p0))
p <- seq(.01,.99,.01)
P <- p
for(i in 1:length(p))
   {
   P[i] <- sum(dbinom(R,n,p[i]))
   }
plot(p,P,ylab="Power",type="l",col=COL)
title(paste("Power curve for n=",n,"   alpha =",round(Alpha,4)))
}

The type="l" in the plot command is an "ell", not a "one". It tells the plot function to make a line plot. To use the function with a specified rejection region, create a dataset R with the values defining the rejection region, and enter the sample size n, for example:

R <- c(0:3,12:15)
bin.pwr(15,p0=1/2,RR=R)

The minimum point on this curve is the value at p=1/2, the null hypothesis. The power increases for alternatives farther away from 1/2. Does that make sense to you?

You may also use the function with a specified target significance level without specifying a rejection region:

bin.pwr(15,p0=1/2,alpha=.05)

Try it with some different sample sizes and rejection regions:

par(mfrow=c(2,2))
R10 <- c(0:1,9:10)
bin.pwr(10,RR=R10)
R20 <- c(0:5,15:20)
bin.pwr(20,RR=R20)
R40 <- c(0:13,27:40)
bin.pwr(40,RR=R40)
R80 <- c(0:30,50:80)
bin.pwr(80,RR=R80)

How does the power curve change as the sample size gets larger? (We haven't held alpha exactly fixed, but all were close to .03)

Testing other null hypotheses

Finally, we could test some other hypothesis here, such as p=3/4 instead of p=1/2. The procedure is the same, the only change is that calculations would then be based on the assumption that p=3/4. What should we choose for the rejection region?

> # Binomial(15,3/4) probabilities
> cbind(0:15,round(dbinom(0:15,15,3/4),5))
      [,1]    [,2]
 [1,]    0 0.00000
 [2,]    1 0.00000
 [3,]    2 0.00000
 [4,]    3 0.00001
 [5,]    4 0.00010
 [6,]    5 0.00068
 [7,]    6 0.00340
 [8,]    7 0.01311
 [9,]    8 0.03932
[10,]    9 0.09175
[11,]   10 0.16515
[12,]   11 0.22520
[13,]   12 0.22520
[14,]   13 0.15591
[15,]   14 0.06682
[16,]   15 0.01336

barplot(dbinom(0:15,15,3/4),names.arg=c(0:15))
abline(dbinom(7,15,3/4),0)

The tails of the distribution are no longer the same: we need to choose a rejection region assymetrically. For example, to achieve a significance level no bigger than .05, we could include in the rejection region {0:7, 15}. Verify that the significance level (the sum of the probabilities of outcomes in the rejection region computed under the null hypothesis) is about .03. If we included 8 in the rejection region, the significance level would exceed .05; if we let {0:8} be the rejection region, then we are including an outcome with greater probability than the one we left out (15), which would not make sense. Outcomes with smaller probability must be stronger evidence against the null hypothesis!

Question: Back to Darwin's data: can we reject the null hypothesis that p=3/4? Verify that the p-value for testing the null hypothesis that p=3/4 is about .38.  

The binom.test() function

The R binom.test() function does the computations discussed above. The only disadvantage of using it is that you don't see explicitly what the rejection region is. Read the help file:

?binom.test

Try it with Darwin's data:

          binom.test(13,15,p=1/2)

If you don't understand the arguments or the printed output for the binom.test() function, go back and reread the help file now! 

 


Binomial Inference: Confidence Intervals

A confidence interval is simply the set of plausible values for a parameter, based on the observed data. Once you understand the idea of hypothesis testing, this is straight-forward, at least in principle. You just have to find the set of values for p for which you would not reject the corresponding null hypothesis. Let's see how it goes with Darwin's data...

Direct search

We need to find two values for p: the largest plausible value, and the smallest plausible value. Again, "implausible" is defined in terms of events with total probability smaller than .05. To simplify the search, note that for p=1/2, it was sufficient that the probability of observing 13, 14, or 15 successes was less than .025. We need to find two p's: the smallest one for which the probability of at least 13 successes is no smaller than .025, and the largest one for which the probability of 13 or fewer successes is no smaller than .025.

Finding the lower bound for p

A simple-minded iterative search will do the job: we already know that p=1/2 is too small.

> 1 - pbinom(12,15,1/2)
[1] 0.003692627

Try p=0.6:

> 1 - pbinom(12,15,.6)
[1] 0.027114

It's a bit too big, we want .025.

> 1 - pbinom(12,15,.59)
[1] 0.02270254

Close! The answer is somewhere between .59 and .60:

> 1 - pbinom(12,15,.595)
[1] 0.02482434

I'll leave it as an exercise for the reader to get the value any more precise than this.

Finding the upper bound for p

Now, for the upper end of the confidence interval, we need to find the largest plausible p, in other words, the largest p for which P(X <= 13) is at least .025.

Let's start with a value close to 1:

> pbinom(13,15,.98)
[1] 0.03533831

We are close to .025, but will need to search a bit.

> pbinom(13,15,.99)
[1] 0.009629773

Hence, the upper bound is somewhere between p=.98 and p=.99. Again, I'll leave it as an excercise for the reader. When you have found it to the desired accuracy, say 3 significant digits, you will have what is called a 95% confidence interval for the unknown population proportion p.

Again - the 95% CI is just the set of null hypothesis values for p that would not be rejected with p-values less than .05. 

The binom.test function

The binom.test function not only tests your favorite null hypothesis, it also constructs exactly the confidence interval we have been constructing "by hand".

               binom.test(13,15)

Try it out, and compare the result to the values we found by hand above.

To produce a 90% confidence interval, use

               binom.test(13,15,conf.level=.90)

Approximate Confidence Intervals

The central limit theorem guarantees that as the sample size increases, sample means are approximately normally distributed. The standard 95% confidence interval for a sample mean that is exactly normally distributed is just the sample mean plus or minus 1.96 times the standard deviation of the sample mean. For the binomial, the sample proportion is a sample mean (the average of the individual Bernoulli trials, each equal to 1 or 0).

The approximate 95% confidence interval for Darwin's data is:

       P <- 13/15
       se <- sqrt( P*(1-P)/15)
       P + 1.96*c(-1,1)*se 

The "c(-1,1)" causes R to compute both the lower and upper endpoints at the same time. You could do it in two steps:

       P - 1.96*se 
       P + 1.96*se

This is not very close to the correct interval (roughly [.59,98]). That should be no surprise; a sample size of 15 is not very large. For larger samples, and values of p closer to 1/2, the approximate confidence interval will be close to the exact interval.

There is another R function that produces an approximate CI for binomial proportions: prop.test(). To test the null hypthesis for a specific value of p, with observed data x successes out of n trials, use:

      prop.test(x, n, p)

The prop.test() function also returns an approximate CI, level .95 by default. For example, with Darwin's data:

      prop.test(13,15)

This lab was adapted from Albyn Jones' Math 141 Lab Notes, Reed College.