Homework #3 Solutions STAT 506 Spring 2013

  1. We start with an analysis data on rats where the experimental units are 12 thirsty albino rats who are trained to press a lever to get water prior to the experiment. Their pre-experiment pressing rate is recorded as low (1), medium (2), or high (3). They are then injected with one of four levels of a drug where 0 is a control saline solution, the other values are mg per kg of the rat's weight. This was a cross-over design replicated twice, so each rat has 8 measurements of postRate (number of lever presses per second), two at each of four drug levels, and the order of treatments was randomized for each rat. Time intervals between treatments were long enough to remove any carryover effects.

    1. First we look at a plot separating the post rate response by preRate and dose.

I see that prerate has a strong effect, dose only kicks in at the highest level, and there is little or no interaction.


    1. Write out a model for these data using preRate as a three-level factor and drug as a four-level factor. Include distributions for all random components (assuming normality throughout).


yijk = β 0 + βi + τ j + b k + ε ijk where k is ratID number from 1 to 12, i is preRate from 1 to 3, and j is dose from 1 to 4.

bk ˜ iid N(0, σ2rat), for each rat,
εijk ˜ iid N(0, σ2)

    1. Skip: Fit the above model to the data using PROC GLM in SAS using a repeated statement for ratID. Explain the results.

    2. Fit the above model to the data using PROC MIXED in SAS using a random term for each rat. Explain results and compare to those just above.

By fitting a random term for each rat ID, we create a compound symmetric variance-covariance structure. Observations within each rat are correlated, but between rats are uncorrelated. SAS estimates the correlation rat-to-rat variance as 0.005146, compared to underlying variance estimated as 0.001393. Correlation is the former over the sum, 0.787. Prerats 1 (low) and 2 (medium) have negative estimates because they are quite a bit lower than the mean for prerate 3 (high) ( -0.45, SE = .037, and -0.21, SE = 0.37 respectively). The effect of dose shows up because we have contrasts between the high dose of 1.8 and control (0.27, se=0.01), low dose (0.28, 0.01) and dose = 1 (0.28, se = 0.01).


    1. These results do agree with those of R.


  1. We then examined the Sitka data (from the MASS library in R) on the growth of 79 sitka spruce trees.

    1. Here's a plot of the data, and the code I used to create it.





PROC SGpanel data= sitka;

panelBy treat;

series x= time y = size /group = tree;

run;


    1. Use PROC GLM to fit a quadratic model across all the data. Update the model adding treatment effects which allow the intercept, slope, or quadratic coefficients to depend on treatment.

I did this in Proc Mixed to get AIC which is 818


    1. Using PROC MIXED

      1. Add AR1 correlation structure (within a tree).

AIC drops to -54.9

      1. Add compound symmetric correlation (within a tree).

AIC = 94.1

      1. Add symmetric correlation (within a tree).

AIC = -74.1Which surprises me, as all covariances are pretty close to the estimate of CS which is 0.37

      Compare the four models using AIC.

    1. Reduce the model one term at a time until all terms have small p-values in the t-tests.

      I took out treat*timesq because the pvalue was 0.19. Then all terms had small p-values.

    2. Plot the residuals versus fitted and normal quantile plots. Discuss any problems.





    1. How does the ozone treatment affect growth of these trees? Does SAS give results just like those of R?

As we saw with R, there is a quadratic pattern to growth and the slope for size depends on the treatment.


  1. In a study of soil properties, samples were taken on a 10 point by 25 point grid. Download the data here We'll work with two variables: response Ca (calcium concentration) and predictor pH (low numbers are acidic, high numbers basic, 7 is neither).

    1. Make a scatterplot of the two variables and fit a model for Ca based on pH. (Choose the form of the model based on the scatterplot.) Print the estimated coefficients and discuss the relationship.



Still looks linear.



    1. Fit the five forms of spatial correlation available in the nlme library. Compare them with each other and with the original model. Do any of the spatial correlation fits improve AIC by more than 2 units? Which is the best of the 5 fits?

Model

AIC

AIC from R

No correlation

69.3

73.3

Linear spatial

-351.5

-64.9

Exponential spatial

-369.3

-96.5

Gaussian spatial

-365.8

-84.1

Spherical spatial

-365

-97.1

All are way better than the original. With SAS, Exponential is best by a whisker, whereas in R it was spherical, but the two were close in each case.



    1. Compare results with those we got last semester in HW6 using R.

We have to add the LOCAL option to get a nugget. Then the SAS output agrees well with R.




 /******************  Rats repeated measure  *****/
PROC Import out = rats datafile="~/Projects/stat506/HW3/drugResponse.csv" 
  dbms=dlm    replace;
  delimiter=',' ;
  getnames=yes;
run;
*PROC means;
*run;
PROC sgpanel data = rats;
   panelby dose / columns = 4;
   vbox PostRate /category= PreRate;
run;
PROC sgpanel data = rats;
   panelby prerate / columns = 3;
   vbox PostRate /category= dose;
run;

PROC mixed data = rats;
   class dose prerate;
   model postrate = prerate dose/ solution ss1;
   random intercept/ subject =  ratID;
   run;


Dimensions


Covariance Parameters

2

Columns in X

8

Columns in Z Per Subject

1

Subjects

24

Max Obs Per Subject

4



Covariance Parameter Estimates


Cov Parm

Subject

Estimate

Intercept

ratID

0.005146

Residual


0.001393



Solution for Fixed Effects


Effect

dose

preRate

Estimate

Standard Error

DF

t Value

Pr > |t|

Intercept



0.9905

0.02702

21

36.65

<.0001

preRate


1

-0.4453

0.03706

69

-12.02

<.0001

preRate


2

-0.2100

0.03706

69

-5.67

<.0001

dose

0


0.2687

0.01077

69

24.95

<.0001

dose

1


0.2825

0.01077

69

26.22

<.0001

dose

0.5


0.2767

0.01077

69

25.68

<.0001




/********************* Tree growth *****/

PROC Import out = Sitka datafile="~/Projects/stat506/HW3/sitka.csv"

dbms=dlm replace;

delimiter=',' ;

getnames=yes;

run;

PROC means;

run;


PROC SGpanel data= sitka;

panelBy treat;

series x= time y = size /group = tree;

run;

data sitka;

set sitka;

timeSq = (time - 202)**2;

run;

Proc Mixed data = sitka;

class treat;

model size = treat|time treat|timeSq;

* random intercept / subject = tree type = ;

run;

Proc Mixed data = sitka;

class treat tree;

model size = treat|time treat|timeSq / htype=1;

repeated / subject = tree type = cs;

run;

PROC Mixed data = sitka;

class treat tree;

model size = treat|time treat|timeSq / htype=1;

repeated /subject = tree type = un;

run;

proc Mixed data = sitka;

class treat tree;

model size = treat|time treat|timeSq / htype=1;

repeated /subject = tree type = ar(1) ;

run;

PROC Mixed data = sitka;

class treat tree;

model size = treat|time timeSq / htype=1;

repeated /subject = tree type = un;

run;

17:29 Sunday, January 27, 2013


No correlation:


Covariance Parameter Estimates


Cov Parm

Estimate

Residual

0.3945

AIC (smaller is better)

818.2



Compound symmetric

Covariance Parameter Estimates


Cov Parm

Subject

Estimate

CS

tree

0.3722

Residual


0.02616

AIC (smaller is better)

94.1



Unstructured symmetric


1

2

3

4

5

1

0.44





2

0.4

0.39




3

0.37

0.37

0.36



4

0.37

0.37

0.37

0.41


5

0.37

0.37

0.4

0.4038

0.41



AIC (smaller is better)

-74.1





Autoregressive:



AR(1)

tree

0.9710

AIC (smaller is better)

-54.9





Unstructured after dropping timeSq*treatment

AIC (smaller is better)

-94.6

Correlations are the same.


Type 1 Tests of Fixed Effects

Effect

Num DF

Den DF

F Value

Pr > F

treat

1

77

1.67

0.1996

Time

1

77

1275.25

<.0001

Time*treat

1

77

15.27

0.0002

timeSq

1

77

273.85

<.0001






/******************   Soils  spatial ********************************/
PROC Import out = soils datafile="~/Projects/stat506/HW3/soils.dat" 
  dbms=dlm    replace;
  delimiter=' ' ;
  getnames=yes;
run;
PROC means;
run;

PROC SGPLOT data = soils;
  scatter x = Ca  y = pH / markerattrs=(symbol=CircleFilled) 
                  transparency=0.7;
  run;


 Proc mixed data = soils;
   model pH = Ca ;
  run;
    
 Proc mixed data = soils;
  model pH = Ca ;
  repeated /local   subject = intercept type = sp(exp) (column row);
  run;
  
Proc mixed data = soils;
  model pH = Ca ;
  repeated / local  subject = intercept type = sp(lin) (column row);
  run;

  
 Proc mixed data = soils;
  model pH = Ca ;
  repeated / local  subject = intercept type = sp(gau) (column row);
  run;
  
  
 Proc mixed data = soils;
  model pH = Ca ;
  repeated /local   subject = intercept type = sp(sph) (column row);
  run;
  

No Covariance:

Covariance Parameter Estimates

Cov Parm

Estimate


Residual

0.07436


AIC (smaller is better)

69.3



Exponential

Covariance Parameter Estimates


Cov Parm

Subject

Estimate

Variance

Intercept

0.01859

SP(EXP)

Intercept

2.4626

Residual


0.003757


AIC (smaller is better)

-369.3



Gaussian

Covariance Parameter Estimates


Cov Parm

Subject

Estimate

Variance

Intercept

0.01173

SP(GAU)

Intercept

2.6093

Residual


0.008834


AIC (smaller is better)

-365.8



Linear

Covariance Parameter Estimates


Cov Parm

Subject

Estimate

Variance

Intercept

0.009241

SP(LIN)

Intercept

0.1822

Residual


0.01008



AIC (smaller is better)

-351.5



Spherical



Covariance Parameter Estimates


Cov Parm

Subject

Estimate

Variance

Intercept

0.04191

SP(SPH)

Intercept

14.1681

Residual


0.005827


AIC (smaller is better)

-365








Author: Jim Robison-Cox
Last Updated: