Refer back to HW2 of Stat 505.
Reanalyze the same data in SAS.
Researchers in Jordan are
interested in plants useable for animal fodder which require little
moisture. They tested four plant species in a greenhouse experiment
varying the daily watering from 50 to 650 mm in 100 mm increments.
Within each species, water amounts were allocated at random. The
response is dry biomass. Read in the data
from here .
Plot the data in a manner which allows us to easily compare median biomass for each species as a function of water.

Fit a model with an intercept and
a slope for each plant.
yi = β0 + β1
xi + α 0j[i] + α 1j[i] xi
+ εi
In this notation, i is the row number, j[i] is
the plant species of the plant in row i (j = 1 to 4), β's are
overall effects, and α's are adjustments for each plant species.
Fit this model in SAS and show the estimated coefficients.
|
Parameter |
Estimate |
|
Standard Error |
t Value |
Pr > |t| |
|---|---|---|---|---|---|
|
Intercept |
3.995279483 |
B |
0.35384655 |
11.29 |
<.0001 |
|
species 1 |
2.389633552 |
B |
0.50041460 |
4.78 |
<.0001 |
|
species 2 |
2.337465373 |
B |
0.50041460 |
4.67 |
<.0001 |
|
species 3 |
0.901910823 |
B |
0.50041460 |
1.80 |
0.0744 |
|
water |
0.005930727 |
B |
0.00087779 |
6.76 |
<.0001 |
|
water*species 1 |
0.002272476 |
B |
0.00124138 |
1.83 |
0.0700 |
|
water*species 2 |
0.001774422 |
B |
0.00124138 |
1.43 |
0.1559 |
|
water*species 3 |
0.000286433 |
B |
0.00124138 |
0.23 |
0.8180 |
|
Note: |
The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. |
What combination of Greek letters is estimated by each coefficient shown?
|
Parameter |
Estimate |
|
β0 + α 04 |
3.995 |
|
α 01 - α 04 |
2.39 |
|
α 02 - α 04 |
2.34 |
|
α 03 - α 04 |
0.90 |
|
β1 + α 14 |
0.0059 |
|
α 11 - α 14 |
0.0023 |
|
α 12 - α 14 |
0.0018 |
|
α 13 - α 14 |
0.00029 |
Provide either the Type I or Type III output table, and explain why you think this table is preferred.
|
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|
species |
3 |
199.8131803 |
66.6043934 |
77.18 |
<.0001 |
|
water |
1 |
220.4026955 |
220.4026955 |
255.40 |
<.0001 |
|
water*species |
3 |
4.1443638 |
1.3814546 |
1.60 |
0.1937 |
We need to use Tyoe I sums of squares because it makes no sense to look at the need for a main effect when the model includes an interaction of that main effect and some other predictor.
Plot the four lines.

Slopes appear to be quite similar, which makes sense since the interaction term has a large p-value.
Provide the default diagnostic plots.

No problem with the diagnostic plots. These data look quite “normal” and seem to have constant spread.
2. Refer back to HW4 of Stat 505. Reanalyze the warpbreaks data in SAS.
Provide estimates of the effects of tension and wool type.

Note: SAS does not sort the observations, so the first 3 boxes are for wool type A, second three for wool type B. If you want to combine over wool type, you have to run PROC SORT first.
Using a transformation on the response.
|
R-Square |
Coeff Var |
Root MSE |
lbreaks Mean |
|
0.336295 |
11.53818 |
0.373994 |
3.241362 |
Rsquared and residual std error agree with R.
|
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|
wool |
1 |
0.31253456 |
0.31253456 |
2.23 |
0.1415 |
|
tension |
2 |
2.17616902 |
1.08808451 |
7.78 |
0.0012 |
|
wool*tension |
2 |
0.91314956 |
0.45657478 |
3.26 |
0.0469 |
|
Parameter |
Estimate |
|
Standard Error |
t Value |
Pr > |t| |
|
Intercept |
3.309326659 |
B |
0.12 |
26.55 |
<.0001 |
|
wool A |
-0.192577085 |
B |
0.18 |
-1.09 |
0.2801 |
|
tension H |
-0.405175006 |
B |
0.18 |
-2.30 |
0.0260 |
|
tension L |
-0.026948213 |
B |
0.18 |
-0.15 |
0.8792 |
|
wool*tension A H |
0.406048035 |
B |
0.25 |
1.63 |
0.1100 |
|
wool*tension A L |
0.628143922 |
B |
0.25 |
2.52 |
0.0151 |
Predictions agree with R: 3.309 for woolB-Medium tension, 3.718 for woolA-Lowtension, etc.

Diagnostics look good. No trend in plots of residuals by fits. There is a strong interaction effect, so we need to interpret wool type and tension jointly, not one at a time. WoolB-Hightension has lowest mean log(breaks), estimated as 2.90 with SE = .125
Using appropriate weights, which could come from the R models we ran last semester.
Provide suitable diagnostic plots for both.
From R, the varPower function returned an estimate of alpha = 1.35, so weights are predictions to the -2.70 = -2 alpha.
|
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
|---|---|---|---|---|---|
|
Model |
5 |
0.26719782 |
0.05343956 |
4.28 |
0.0027 |
|
Error |
48 |
0.59985368 |
0.01249695 |
|
|
|
Corrected Total |
53 |
0.86705150 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
breaks Mean |
|---|---|---|---|
|
0.308168 |
0.466108 |
0.111790 |
23.98366 |
Root MSE of 0.112 is slightly different from R's estimate of 0.113
|
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|---|---|---|---|---|---|
|
wool |
1 |
0.02588201 |
0.02588201 |
2.07 |
0.1566 |
|
tension |
2 |
0.15130903 |
0.07565451 |
6.05 |
0.0045 |
|
wool*tension |
2 |
0.09000679 |
0.04500339 |
3.60 |
0.0349 |
F tests agree except in 3rd significant digit.
|
Parameter |
Estimate |
|
Standard Error |
t Value |
Pr > |t| |
|---|---|---|---|---|---|
|
Intercept |
28.77777778 |
B |
3.47542629 |
8.28 |
<.0001 |
|
wool A |
-4.77777778 |
B |
4.41326272 |
-1.08 |
0.2844 |
|
tension H |
-10.00000000 |
B |
3.98657315 |
-2.51 |
0.0156 |
|
tension L |
-0.55555556 |
B |
4.85158544 |
-0.11 |
0.91 |
|
wool*tension A H |
10.55555556 |
B |
5.58219545 |
1.89 |
0.0647 |
|
wool*tension A L |
21.11111111 |
B |
8.38180476 |
2.52 |
0.0152 |
Again, predictions agree between SAS and R.

Diagnostics look fine. Normally distributed with constant variance (in 2nd plot).
PROC IMPORT datafile="~/Projects/stat506/HW2/plantBiomass.csv" out=biomass dbms=dlm replace; delimiter=',' ; getnames=yes; run; * proc contents; run; Proc Sort data = biomass; by species; run; proc sgpanel data= biomass; panelby species; vbox biomass / category= water; run; PROC Freq data = biomass; table species; run; DATA biomass; set biomass; sqrtMass = sqrt(biomass); run; PROC GLM data = biomass plot = diagnostics; class species; model sqrtMass = species | water/ solution ss1 ; run;
PROC IMPORT
datafile="~/Projects/stat506/HW2/warpbreaks.csv"
out=warpbreaks dbms=dlm replace;
delimiter=',' ;
getnames=yes;
run;
* proc contents;
run;
proc boxplot data = warpbreaks ;
plot breaks * tension;
plot breaks * wool;
run;
data warpbreaks;
set warpbreaks;
lbreaks = log(breaks);
run;
title "Transformed Response";
proc boxplot data = warpbreaks ;
plot breaks * tension;
plot breaks * wool;
run;
data warpbreaks;
set warpbreaks;
lbreaks = log(breaks);
run;
proc glm data = warpbreaks plot = diagnostics;
class wool tension;
model lbreaks = wool | tension/solution;
run;
title "first run";
proc glm data = warpbreaks noprint;
class wool tension;
model breaks = wool |tension;
output out = breaks2 p=yhat;
run;
data breaks2;
set breaks2;
wts = yhat ** -2.7;
run;
title "second run";
proc glm data = breaks2 noprint;
class wool tension;
weight wts;
model breaks = wool |tension;
output out = breaks2 p=yhat2;
run;
data breaks2;
set breaks2;
wts = yhat2 ** -2.7;
run;
title "third run";
proc glm data = breaks2 plots = diagnostics;
class wool tension;
weight wts;
model breaks = wool |tension;
*output out = break2 p=yhat;
run;