First, we looked at

library(Sleuth3)
# ex0222   # If you type this line in you get ~2500 lines of data
ex0222[1:10,]   # This line only outputs 10 lines of data
##    Gender Arith Word Parag Math AFQT
## 1    male    19   27    14   14 70.3
## 2  female    23   34    11   20 60.4
## 3    male    30   35    14   25 98.3
## 4  female    30   35    13   21 84.7
## 5  female    13   30    11   12 44.5
## 6  female     8   15     6    4  4.0
## 7  female    10   17     6    7 11.8
## 8    male     4   17     6    6  8.9
## 9    male    12   33    13   11 44.7
## 10   male     3   11     5    6  2.8

Now on to the data for the Box Cox example. First load in the biofilm data, look at it, and plot it:

# Get biofilm data

d
##      Group  Number
## 1 chlorine      37
## 2 chlorine      21
## 3 chlorine      44
## 4  control 8234552
## 5  control 5566343
## 6  control 7899143
# Plot the data
hist(d\$Number)

Some errors trying to get Box Cox function to work

# First error because we need library(MASS) loaded
# boxcox(Number)
# Error in boxcox(Number) : could not find function "boxcox"

library(MASS)

# Second error because we need to specify the dataframe
#boxcox(Number)
#Error in boxcox(Number) : object 'Number' not found

# Third error because we need to add '~1' syntax
#boxcox(Number,data=d)
#Error in boxcox(Number, data = d) : object 'Number' not found

OK, finally get Box-Cox output as if the data were from one sample!

boxcox(Number ~ 1,data=d)

But the data are from two samples, so tell boxcox()

boxcox(Number ~ Group,data=d)

# Zoom in to lambda values between -1/2 and 1/2
boxcox(Number ~ Group,data=d,seq(-.5,.5,.1))

Box-Cox suggests a log transform, i.e. $$\lambda=0$$. So log-transform and then apply Box-Cox again.

boxcox(log10(Number) ~ Group,data=d,seq(-.5,.5,.1))

# Change the zoom
boxcox(log10(Number) ~ Group,data=d)

boxcox(log10(Number) ~ Group,data=d,seq(0,3,.1))

We should really apply qqnorm() and qqline() to assess normality. But Box-Cox CI including $$\lambda=1$$ suggests that the log transformed data are approximately normal.

Now apply a 2-sample t-test to the log-transformed data

t.test(log10(Number) ~ Group,data=d)
##
##  Welch Two Sample t-test
##
## data:  log10(Number) by Group
## t = -48.129, df = 3.1287, p-value = 1.351e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.686763 -4.996514
## sample estimates:
## mean in group chlorine  mean in group control
##               1.511291               6.852930
# 95% CI for mean log(chlorine Num.) - mean log(control Num.)
c( -5.686763, -4.996514)  
## [1] -5.686763 -4.996514
# Anti-log10-transform to get 95% CI for (median chlorine Num)/ (median control Num)
10^c( -5.686763, -4.996514)  
## [1] 2.057013e-06 1.008059e-05