Stat 408 Exploring Data in SAS

Using example data from UCLA SAS starter kit on regression

Build a directory for stat 408 and move to that directory. Type these commands in a terminal window:

mkdir stat408
cd stat408
wget http://www.math.montana.edu/~jimrc/classes/stat408/data/elemapi.csv
sas
Alternatively, instead of using wget, right-click this data link and "Save As" to your stat408 folder.

Cleaning Data

Our goal is to build a multiple regression model for api00 (a measure of how well a school did on standardized tests in 2000) based on meals (percentage of kids getting free lunch -- a proxy for poverty), full (proportion of teachers with full credentials), and acs_k3 (average class size in kindergarten to 3rd grade).
Start with a look at the data:
options ls=72 ps = 66;
dm "log;clear;out;clear;";

PROC Import out= Elemapi datafile= "elemapi.csv"  dbms=csv replace;
      getnames=yes;          
      datarow=2;
    run;
PROC contents data=WORK.Elemapi ;
run;
Let's look at each variable:
PROC Univariate data=WORK.Elemapi ;
  var acs_k3;
run;

PROC freq data=Elemapi;
  tables acs_k3;
run;
There is a problem here. Which schools (snum) and districts (dnum) are involved?
PROC print data=Elemapi;
  where (acs_k3 < 0);
  var snum dnum acs_k3;
run;

PROC print data=Elemapi;
  where (acs_k3 < 0) and (acs_k3 ^= .);
  var snum dnum acs_k3;
run;


PROC print data=Elemapi;
  where (dnum = 140);
  var snum dnum acs_k3;
run;
Fix the problem:
DATA elemapi;
  set Elemapi;
  acs_k3  = abs(acs_k3);
run;
Next issue: what's wrong with FULL?
PROC univariate data=elemapi plot;
  var full;
run;

PROC freq data=elemapi ;
  tables full;
run;

PROC freq data=elemapi ;
  where (full <= 1);
  tables dnum;
run;
Fix it!
Now look at scatterplots:
ods html;

PROC sgscatter data=elemapi ;
 matrix api00 acs_k3 meals full;
run;

ods html close;
Now fit a multiple regression and look at the diagnostics.
ods html;

PROC reg data=elemapi  plot = diagnostics ;
  model api00 = acs_k3 meals full ;

run;

ods html close;

ods graphics on;

PROC GLM data=elemapi PLOTS = diagnostics;
  model api00 = acs_k3 meals full ;
run;

ods graphics off ;



Author: Jim Robison-Cox
Last Updated: