We start with an analysis data on rats where the experimental units are 12 thirsty albino rats who are trained to press a lever to get water prior to the experiment. Their pre-experiment pressing rate is recorded as low (1), medium (2), or high (3). They are then injected with one of four levels of a drug where 0 is a control saline solution, the other values are mg per kg of the rat's weight. This was a cross-over design replicated twice, so each rat has 8 measurements of postRate (number of lever presses per second), two at each of four drug levels, and the order of treatments was randomized for each rat. Time intervals between treatments were long enough to remove any carryover effects.
First we look at a plot
separating the post rate response by preRate and dose.
I see that prerate has a strong effect, dose only kicks in at the highest level, and there is little or no interaction.
Write out a model for these data using preRate as a three-level factor and drug as a four-level factor. Include distributions for all random components (assuming normality throughout).
yijk = β 0 + βi + τ j + b k + ε ijk where k is ratID number from 1 to 12, i is preRate from 1 to 3, and j is dose from 1 to 4.
bk ˜
iid N(0, σ2rat), for each rat,
εijk
˜ iid N(0, σ2)
Skip: Fit the above model to the
data using PROC GLM in SAS using a
repeated statement for ratID. Explain the results.
Fit the above model to the data
using PROC MIXED in SAS using a random
term for each rat. Explain results and compare to those just above.
By fitting a random term for each rat ID, we create a compound symmetric variance-covariance structure. Observations within each rat are correlated, but between rats are uncorrelated. SAS estimates the correlation rat-to-rat variance as 0.005146, compared to underlying variance estimated as 0.001393. Correlation is the former over the sum, 0.787. Prerats 1 (low) and 2 (medium) have negative estimates because they are quite a bit lower than the mean for prerate 3 (high) ( -0.45, SE = .037, and -0.21, SE = 0.37 respectively). The effect of dose shows up because we have contrasts between the high dose of 1.8 and control (0.27, se=0.01), low dose (0.28, 0.01) and dose = 1 (0.28, se = 0.01).
These results do agree with those of R.
We then examined the Sitka data (from the MASS library in R) on the growth of 79 sitka spruce trees.
Here's a plot of the data, and the code I used to create it.

PROC SGpanel data= sitka;
panelBy treat;
series x= time y = size /group = tree;
run;
Use PROC
GLM to fit a quadratic model across all the data. Update the
model adding treatment effects which allow the intercept, slope, or
quadratic coefficients to depend on treatment.
I did this in Proc Mixed to get AIC which is 818
Using PROC
MIXED
Add AR1 correlation structure (within a tree).
AIC drops to -54.9
Add compound symmetric correlation (within a tree).
AIC = 94.1
Add symmetric correlation (within a tree).
AIC = -74.1Which surprises me, as all covariances are pretty close to the estimate of CS which is 0.37
Compare the four models using AIC.
Reduce the model one term at a time until all terms have small p-values in the t-tests.
I took out treat*timesq because the pvalue was 0.19. Then all terms had small p-values.
Plot the residuals versus fitted and normal quantile plots. Discuss any problems.

How does the ozone treatment affect growth of these trees? Does SAS give results just like those of R?
As we saw with R, there is a quadratic pattern to growth and the slope for size depends on the treatment.
In a study of soil properties, samples were taken on a 10 point by 25 point grid. Download the data here We'll work with two variables: response Ca (calcium concentration) and predictor pH (low numbers are acidic, high numbers basic, 7 is neither).
Make a scatterplot of the two variables and fit a model for Ca based on pH. (Choose the form of the model based on the scatterplot.) Print the estimated coefficients and discuss the relationship.

Still looks linear.
Fit the five forms of spatial correlation available in the nlme library. Compare them with each other and with the original model. Do any of the spatial correlation fits improve AIC by more than 2 units? Which is the best of the 5 fits?
|
Model |
AIC |
AIC from R |
|---|---|---|
|
No correlation |
69.3 |
73.3 |
|
Linear spatial |
-351.5 |
-64.9 |
|
Exponential spatial |
-369.3 |
-96.5 |
|
Gaussian spatial |
-365.8 |
-84.1 |
|
Spherical spatial |
-365 |
-97.1 |
All are way better than the original. With SAS, Exponential is best by a whisker, whereas in R it was spherical, but the two were close in each case.
Compare results with those we got last semester in HW6 using R.
We have to add the LOCAL option to get a nugget. Then the SAS output agrees well with R.
/****************** Rats repeated measure *****/ PROC Import out = rats datafile="~/Projects/stat506/HW3/drugResponse.csv" dbms=dlm replace; delimiter=',' ; getnames=yes; run; *PROC means; *run; PROC sgpanel data = rats; panelby dose / columns = 4; vbox PostRate /category= PreRate; run; PROC sgpanel data = rats; panelby prerate / columns = 3; vbox PostRate /category= dose; run; PROC mixed data = rats; class dose prerate; model postrate = prerate dose/ solution ss1; random intercept/ subject = ratID; run;
|
Dimensions |
|
|
|
Covariance Parameters |
2 |
|
|
Columns in X |
8 |
|
|
Columns in Z Per Subject |
1 |
|
|
Subjects |
24 |
|
|
Max Obs Per Subject |
4 |
|
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
Intercept |
ratID |
0.005146 |
|
|
Residual |
|
0.001393 |
|
|
Solution for Fixed Effects |
|
|||||||
|
Effect |
dose |
preRate |
Estimate |
Standard Error |
DF |
t Value |
Pr > |t| |
|
|
Intercept |
|
|
0.9905 |
0.02702 |
21 |
36.65 |
<.0001 |
|
|
preRate |
|
1 |
-0.4453 |
0.03706 |
69 |
-12.02 |
<.0001 |
|
|
preRate |
|
2 |
-0.2100 |
0.03706 |
69 |
-5.67 |
<.0001 |
|
|
dose |
0 |
|
0.2687 |
0.01077 |
69 |
24.95 |
<.0001 |
|
|
dose |
1 |
|
0.2825 |
0.01077 |
69 |
26.22 |
<.0001 |
|
|
dose |
0.5 |
|
0.2767 |
0.01077 |
69 |
25.68 |
<.0001 |
|
/********************* Tree growth *****/
PROC Import out = Sitka datafile="~/Projects/stat506/HW3/sitka.csv"
dbms=dlm replace;
delimiter=',' ;
getnames=yes;
run;
PROC means;
run;
PROC SGpanel data= sitka;
panelBy treat;
series x= time y = size /group = tree;
run;
data sitka;
set sitka;
timeSq = (time - 202)**2;
run;
Proc Mixed data = sitka;
class treat;
model size = treat|time treat|timeSq;
* random intercept / subject = tree type = ;
run;
Proc Mixed data = sitka;
class treat tree;
model size = treat|time treat|timeSq / htype=1;
repeated / subject = tree type = cs;
run;
PROC Mixed data = sitka;
class treat tree;
model size = treat|time treat|timeSq / htype=1;
repeated /subject = tree type = un;
run;
proc Mixed data = sitka;
class treat tree;
model size = treat|time treat|timeSq / htype=1;
repeated /subject = tree type = ar(1) ;
run;
PROC Mixed data = sitka;
class treat tree;
model size = treat|time timeSq / htype=1;
repeated /subject = tree type = un;
run;
17:29
Sunday, January 27, 2013
No correlation:
|
Covariance Parameter Estimates |
|
|
|
Cov Parm |
Estimate |
|
|
Residual |
0.3945 |
|
|
AIC (smaller is better) |
818.2 |
Compound symmetric
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
CS |
tree |
0.3722 |
|
|
Residual |
|
0.02616 |
|
|
AIC (smaller is better) |
94.1 |
|
|
1 |
2 |
3 |
4 |
5 |
|
1 |
0.44 |
|
|
|
|
|
2 |
0.4 |
0.39 |
|
|
|
|
3 |
0.37 |
0.37 |
0.36 |
|
|
|
4 |
0.37 |
0.37 |
0.37 |
0.41 |
|
|
5 |
0.37 |
0.37 |
0.4 |
0.4038 |
0.41 |
|
AIC (smaller is better) |
-74.1 |
Autoregressive:
|
AR(1) |
tree |
0.9710 |
|
AIC (smaller is better) |
-54.9 |
|
AIC (smaller is better) |
-94.6 |
|
Type 1 Tests of Fixed Effects |
||||
|---|---|---|---|---|
|
Effect |
Num DF |
Den DF |
F Value |
Pr > F |
|
treat |
1 |
77 |
1.67 |
0.1996 |
|
Time |
1 |
77 |
1275.25 |
<.0001 |
|
Time*treat |
1 |
77 |
15.27 |
0.0002 |
|
timeSq |
1 |
77 |
273.85 |
<.0001 |
/****************** Soils spatial ********************************/
PROC Import out = soils datafile="~/Projects/stat506/HW3/soils.dat"
dbms=dlm replace;
delimiter=' ' ;
getnames=yes;
run;
PROC means;
run;
PROC SGPLOT data = soils;
scatter x = Ca y = pH / markerattrs=(symbol=CircleFilled)
transparency=0.7;
run;
Proc mixed data = soils;
model pH = Ca ;
run;
Proc mixed data = soils;
model pH = Ca ;
repeated /local subject = intercept type = sp(exp) (column row);
run;
Proc mixed data = soils;
model pH = Ca ;
repeated / local subject = intercept type = sp(lin) (column row);
run;
Proc mixed data = soils;
model pH = Ca ;
repeated / local subject = intercept type = sp(gau) (column row);
run;
Proc mixed data = soils;
model pH = Ca ;
repeated /local subject = intercept type = sp(sph) (column row);
run;
No Covariance:
|
Covariance Parameter Estimates |
||
|
Cov Parm |
Estimate |
|
|
Residual |
0.07436 |
|
|
AIC (smaller is better) |
69.3 |
Exponential
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
Variance |
Intercept |
0.01859 |
|
|
SP(EXP) |
Intercept |
2.4626 |
|
|
Residual |
|
0.003757 |
|
|
AIC (smaller is better) |
-369.3 |
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
Variance |
Intercept |
0.01173 |
|
|
SP(GAU) |
Intercept |
2.6093 |
|
|
Residual |
|
0.008834 |
|
|
AIC (smaller is better) |
-365.8 |
Linear
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
Variance |
Intercept |
0.009241 |
|
|
SP(LIN) |
Intercept |
0.1822 |
|
|
Residual |
|
0.01008 |
|
|
AIC (smaller is better) |
-351.5 |
Spherical
|
Covariance Parameter Estimates |
|
||
|
Cov Parm |
Subject |
Estimate |
|
|
Variance |
Intercept |
0.04191 |
|
|
SP(SPH) |
Intercept |
14.1681 |
|
|
Residual |
|
0.005827 |
|
|
AIC (smaller is better) |
-365 |