Homework #2 Solutions STAT 506 Spring 2013

  1. Refer back to HW2 of Stat 505. Reanalyze the same data in SAS.
    Researchers in Jordan are interested in plants useable for animal fodder which require little moisture. They tested four plant species in a greenhouse experiment varying the daily watering from 50 to 650 mm in 100 mm increments. Within each species, water amounts were allocated at random. The response is dry biomass. Read in the data from here .

    1. Plot the data in a manner which allows us to easily compare median biomass for each species as a function of water.




    1. Fit a model with an intercept and a slope for each plant.
      yi = β0 + β1 xi + α 0j[i] + α 1j[i] xi + εi
      In this notation, i is the row number, j[i] is the plant species of the plant in row i (j = 1 to 4), β's are overall effects, and α's are adjustments for each plant species.

      1. Fit this model in SAS and show the estimated coefficients.

        Parameter

        Estimate

         

        Standard Error

        t Value

        Pr > |t|

        Intercept

        3.995279483

        B

        0.35384655

        11.29

        <.0001

        species 1

        2.389633552

        B

        0.50041460

        4.78

        <.0001

        species 2

        2.337465373

        B

        0.50041460

        4.67

        <.0001

        species 3

        0.901910823

        B

        0.50041460

        1.80

        0.0744

        water

        0.005930727

        B

        0.00087779

        6.76

        <.0001

        water*species 1

        0.002272476

        B

        0.00124138

        1.83

        0.0700

        water*species 2

        0.001774422

        B

        0.00124138

        1.43

        0.1559

        water*species 3

        0.000286433

        B

        0.00124138

        0.23

        0.8180

        Note:

        The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

      2. What combination of Greek letters is estimated by each coefficient shown?

        Parameter

        Estimate

        β0 + α 04

        3.995

        α 01 - α 04

        2.39

        α 02 - α 04

        2.34

        α 03 - α 04

        0.90

        β1 + α 14

        0.0059

        α 11 - α 14

        0.0023

        α 12 - α 14

        0.0018

        α 13 - α 14

        0.00029

      3. Provide either the Type I or Type III output table, and explain why you think this table is preferred.



Source

DF

Type I SS

Mean Square

F Value

Pr > F

species

3

199.8131803

66.6043934

77.18

<.0001

water

1

220.4026955

220.4026955

255.40

<.0001

water*species

3

4.1443638

1.3814546

1.60

0.1937


We need to use Tyoe I sums of squares because it makes no sense to look at the need for a main effect when the model includes an interaction of that main effect and some other predictor.



      1. Plot the four lines.



Slopes appear to be quite similar, which makes sense since the interaction term has a large p-value.

      1. Provide the default diagnostic plots.




No problem with the diagnostic plots. These data look quite “normal” and seem to have constant spread.


    2. Refer back to HW4 of Stat 505. Reanalyze the warpbreaks data in SAS.

    Provide estimates of the effects of tension and wool type.




Note: SAS does not sort the observations, so the first 3 boxes are for wool type A, second three for wool type B. If you want to combine over wool type, you have to run PROC SORT first.


    1. Using a transformation on the response.


R-Square

Coeff Var

Root MSE

lbreaks Mean

0.336295

11.53818

0.373994

3.241362

Rsquared and residual std error agree with R.


Source

DF

Type I SS

Mean Square

F Value

Pr > F

wool

1

0.31253456

0.31253456

2.23

0.1415

tension

2

2.17616902

1.08808451

7.78

0.0012

wool*tension

2

0.91314956

0.45657478

3.26

0.0469



Parameter

Estimate


Standard Error

t Value

Pr > |t|

Intercept

3.309326659

B

0.12

26.55

<.0001

wool A

-0.192577085

B

0.18

-1.09

0.2801

tension H

-0.405175006

B

0.18

-2.30

0.0260

tension L

-0.026948213

B

0.18

-0.15

0.8792

wool*tension A H

0.406048035

B

0.25

1.63

0.1100

wool*tension A L

0.628143922

B

0.25

2.52

0.0151


Predictions agree with R: 3.309 for woolB-Medium tension, 3.718 for woolA-Lowtension, etc.





Diagnostics look good. No trend in plots of residuals by fits. There is a strong interaction effect, so we need to interpret wool type and tension jointly, not one at a time. WoolB-Hightension has lowest mean log(breaks), estimated as 2.90 with SE = .125


    1. Using appropriate weights, which could come from the R models we ran last semester.

    Provide suitable diagnostic plots for both.

From R, the varPower function returned an estimate of alpha = 1.35, so weights are predictions to the -2.70 = -2 alpha.



Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

5

0.26719782

0.05343956

4.28

0.0027

Error

48

0.59985368

0.01249695

 

 

Corrected Total

53

0.86705150

 

 

 


R-Square

Coeff Var

Root MSE

breaks Mean

0.308168

0.466108

0.111790

23.98366

Root MSE of 0.112 is slightly different from R's estimate of 0.113


Source

DF

Type I SS

Mean Square

F Value

Pr > F

wool

1

0.02588201

0.02588201

2.07

0.1566

tension

2

0.15130903

0.07565451

6.05

0.0045

wool*tension

2

0.09000679

0.04500339

3.60

0.0349

F tests agree except in 3rd significant digit.


Parameter

Estimate

 

Standard Error

t Value

Pr > |t|

Intercept

28.77777778

B

3.47542629

8.28

<.0001

wool A

-4.77777778

B

4.41326272

-1.08

0.2844

tension H

-10.00000000

B

3.98657315

-2.51

0.0156

tension L

-0.55555556

B

4.85158544

-0.11

0.91

wool*tension A H

10.55555556

B

5.58219545

1.89

0.0647

wool*tension A L

21.11111111

B

8.38180476

2.52

0.0152


Again, predictions agree between SAS and R.





Diagnostics look fine. Normally distributed with constant variance (in 2nd plot).





SAS Code 1

PROC IMPORT
  datafile="~/Projects/stat506/HW2/plantBiomass.csv" 
  out=biomass   dbms=dlm    replace;
  delimiter=',' ;
  getnames=yes;
run;
* proc contents;
run;
Proc Sort data = biomass;
  by species;
run;

proc sgpanel data= biomass;
  panelby species;
  vbox biomass / category= water;
run;

PROC Freq data = biomass;
   table species;
run;
DATA biomass;
  set biomass;
  sqrtMass = sqrt(biomass);
run;
PROC GLM data = biomass plot = diagnostics;
  class species;
  model sqrtMass = species | water/ solution ss1 ;
run;

SAS Code 2

PROC IMPORT
  datafile="~/Projects/stat506/HW2/warpbreaks.csv" 
  out=warpbreaks   dbms=dlm    replace;
  delimiter=',' ;
  getnames=yes;
run;
* proc contents;
run;
 
proc boxplot data = warpbreaks ;
  plot breaks * tension;
  plot breaks * wool;
run;
data warpbreaks;
  set warpbreaks;
  lbreaks = log(breaks);
  run;
  
   title "Transformed Response";
proc boxplot data = warpbreaks ;
  plot breaks * tension;
  plot breaks * wool;
run;
data warpbreaks;
  set warpbreaks;
  lbreaks = log(breaks);
  run;
  
proc glm data = warpbreaks plot = diagnostics;
  class wool tension;
  model lbreaks = wool | tension/solution;
run;
  
  title "first run";
proc glm data = warpbreaks noprint;
  class wool tension;
  model breaks = wool |tension;
  output out = breaks2 p=yhat;
run;
data breaks2;
  set breaks2;
  wts = yhat ** -2.7;
run;

title "second run";

proc glm data = breaks2 noprint;
  class wool tension;
    weight wts;
  model breaks = wool |tension;
  output out = breaks2 p=yhat2;
run;
data breaks2;
  set breaks2;
  wts = yhat2 ** -2.7;
run;

title "third run";
proc glm data = breaks2 plots = diagnostics;
  class wool tension;
  weight wts;
  model breaks = wool |tension;
  *output out = break2 p=yhat;
run;


Author: Jim Robison-Cox
Last Updated: