First, we looked at
library(Sleuth3)
# ex0222 # If you type this line in you get ~2500 lines of data
ex0222[1:10,] # This line only outputs 10 lines of data
## Gender Arith Word Parag Math AFQT
## 1 male 19 27 14 14 70.3
## 2 female 23 34 11 20 60.4
## 3 male 30 35 14 25 98.3
## 4 female 30 35 13 21 84.7
## 5 female 13 30 11 12 44.5
## 6 female 8 15 6 4 4.0
## 7 female 10 17 6 7 11.8
## 8 male 4 17 6 6 8.9
## 9 male 12 33 13 11 44.7
## 10 male 3 11 5 6 2.8
Now on to the data for the Box Cox example. First load in the biofilm data, look at it, and plot it:
# Get biofilm data
d=read.csv("http://www.math.montana.edu/parker/courses/STAT411/Chapter3_2sampleBoxCox.csv")
d
## Group Number
## 1 chlorine 37
## 2 chlorine 21
## 3 chlorine 44
## 4 control 8234552
## 5 control 5566343
## 6 control 7899143
# Plot the data
hist(d$Number)
Some errors trying to get Box Cox function to work
# First error because we need library(MASS) loaded
# boxcox(Number)
# Error in boxcox(Number) : could not find function "boxcox"
library(MASS)
# Second error because we need to specify the dataframe
#boxcox(Number)
#Error in boxcox(Number) : object 'Number' not found
# Third error because we need to add '~1' syntax
#boxcox(Number,data=d)
#Error in boxcox(Number, data = d) : object 'Number' not found
OK, finally get Box-Cox output as if the data were from one sample!
boxcox(Number ~ 1,data=d)
But the data are from two samples, so tell boxcox()
boxcox(Number ~ Group,data=d)
# Zoom in to lambda values between -1/2 and 1/2
boxcox(Number ~ Group,data=d,seq(-.5,.5,.1))
Box-Cox suggests a log transform, i.e. \(\lambda=0\). So log-transform and then apply Box-Cox again.
boxcox(log10(Number) ~ Group,data=d,seq(-.5,.5,.1))
# Change the zoom
boxcox(log10(Number) ~ Group,data=d)
boxcox(log10(Number) ~ Group,data=d,seq(0,3,.1))
We should really apply qqnorm() and qqline() to assess normality. But Box-Cox CI including \(\lambda=1\) suggests that the log transformed data are approximately normal.
Now apply a 2-sample t-test to the log-transformed data
t.test(log10(Number) ~ Group,data=d)
##
## Welch Two Sample t-test
##
## data: log10(Number) by Group
## t = -48.129, df = 3.1287, p-value = 1.351e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.686763 -4.996514
## sample estimates:
## mean in group chlorine mean in group control
## 1.511291 6.852930
# 95% CI for mean log(chlorine Num.) - mean log(control Num.)
c( -5.686763, -4.996514)
## [1] -5.686763 -4.996514
# Anti-log10-transform to get 95% CI for (median chlorine Num)/ (median control Num)
10^c( -5.686763, -4.996514)
## [1] 2.057013e-06 1.008059e-05