SAS Intro

STAT 408

October 26, 2017

SAS Language Basics

  • SAS Programs
  • SAS Statements
  • SAS Data Sets
  • DATA and PROC Steps

SAS Programs

A SAS program is made up of SAS statements that compose DATA steps and PROC Steps. SAS programs are arranged in a more formal manner than R programs.

The general workflow for SAS programs follows as:

Work Flow

In [1]:
data class;
    set sashelp.class;
    if Sex = 'F';
run;
Out[1]:

11   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
12
13 data class;
14 set sashelp.class;
15 if Sex = 'F';
16 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

17 ods html5 close;ods listing;

18
 
In [2]:
Proc print data=class;
run;
Out[2]:
SAS Output

The SAS System

Obs Name Sex Age Height Weight
1 Alice F 13 56.5 84.0
2 Barbara F 13 65.3 98.0
3 Carol F 14 62.8 102.5
4 Jane F 12 59.8 84.5
5 Janet F 15 62.5 112.5
6 Joyce F 11 51.3 50.5
7 Judy F 14 64.3 90.0
8 Louise F 12 56.3 77.0
9 Mary F 15 66.5 112.0

SAS Statements

SAS statements are commands to execute SAS programs.

  • The number one rule is that each statement ends with a semicolon.
  • SAS statements can be upper or lowercase.
  • SAS statements can continue to next line (provided that words are not split across two lines)
  • SAS statements can be on the same line as other statements.
  • There are two ways to comment SAS code:
    • Start a line with * and end with a semicolon
    • Start a comment with /* and end with */

SAS Data Sets

  • As with R data sets each row is an observation, and the columns represent variables associated with that observation.
  • There are only two data types in SAS:
    • Numeric
    • Character
  • Missing data is represented with a period (.)
  • General rules for variable names in SAS:
    • Names must be 32 characters of fewer in length
    • Names must start with a letter of underscore
    • Names can contain letters (upper or lowercase), numerals, or underscores.

DATA and PROC Steps

SAS programs are construced using statements in the two building blocks: DATA steps and PROC steps.

DATA Steps

  • DATA Steps read and modify data.
  • DATA Steps start with the DATA statement followed by the name for the created SAS data set.
  • DATA Steps can:
    • read external data files
    • include DO loops, IF-THEN/ELSE logic, and other built in functions
    • combine or merge data sets.
In [3]:
* This creates a new data set with only females;
data class;
    set sashelp.class;
    if Sex = 'F';
run;

/* 
Do not print this time 
Proc print data=class;
run;
*/
Out[3]:

27   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
28
29 * This creates a new data set with only females;
30 data class;
31 set sashelp.class;
32 if Sex = 'F';
33 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

34
35 /*
36 Do not print this time
37 Proc print data=class;
38 run;
39 */
40
41 ods html5 close;ods listing;

42
In [4]:
data fish;
    set sashelp.fish;
run;
Out[4]:

44   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
45
46 data fish;
47 set sashelp.fish;
48 run;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set WORK.FISH has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

49 ods html5 close;ods listing;

50

PROC Steps

  • Procedures start with a PROC statement
  • The keyword "PROC" is followed by the procedure name (PRINT)
  • Most procedure have a handful of options - similar to arguments in R functions.
  • A PROC step ends with a run statement (run;)
  • SAS procedures produce results or output.
In [5]:
proc print data=fish;
run;
Out[5]:
SAS Output

The SAS System

Obs Species Weight Length1 Length2 Length3 Height Width
1 Bream 242.0 23.2 25.4 30.0 11.5200 4.0200
2 Bream 290.0 24.0 26.3 31.2 12.4800 4.3056
3 Bream 340.0 23.9 26.5 31.1 12.3778 4.6961
4 Bream 363.0 26.3 29.0 33.5 12.7300 4.4555
5 Bream 430.0 26.5 29.0 34.0 12.4440 5.1340
6 Bream 450.0 26.8 29.7 34.7 13.6024 4.9274
7 Bream 500.0 26.8 29.7 34.5 14.1795 5.2785
8 Bream 390.0 27.6 30.0 35.0 12.6700 4.6900
9 Bream 450.0 27.6 30.0 35.1 14.0049 4.8438
10 Bream 500.0 28.5 30.7 36.2 14.2266 4.9594
11 Bream 475.0 28.4 31.0 36.2 14.2628 5.1042
12 Bream 500.0 28.7 31.0 36.2 14.3714 4.8146
13 Bream 500.0 29.1 31.5 36.4 13.7592 4.3680
14 Bream . 29.5 32.0 37.3 13.9129 5.0728
15 Bream 600.0 29.4 32.0 37.2 14.9544 5.1708
16 Bream 600.0 29.4 32.0 37.2 15.4380 5.5800
17 Bream 700.0 30.4 33.0 38.3 14.8604 5.2854
18 Bream 700.0 30.4 33.0 38.5 14.9380 5.1975
19 Bream 610.0 30.9 33.5 38.6 15.6330 5.1338
20 Bream 650.0 31.0 33.5 38.7 14.4738 5.7276
21 Bream 575.0 31.3 34.0 39.5 15.1285 5.5695
22 Bream 685.0 31.4 34.0 39.2 15.9936 5.3704
23 Bream 620.0 31.5 34.5 39.7 15.5227 5.2801
24 Bream 680.0 31.8 35.0 40.6 15.4686 6.1306
25 Bream 700.0 31.9 35.0 40.5 16.2405 5.5890
26 Bream 725.0 31.8 35.0 40.9 16.3600 6.0532
27 Bream 720.0 32.0 35.0 40.6 16.3618 6.0900
28 Bream 714.0 32.7 36.0 41.5 16.5170 5.8515
29 Bream 850.0 32.8 36.0 41.6 16.8896 6.1984
30 Bream 1000.0 33.5 37.0 42.6 18.9570 6.6030
31 Bream 920.0 35.0 38.5 44.1 18.0369 6.3063
32 Bream 955.0 35.0 38.5 44.0 18.0840 6.2920
33 Bream 925.0 36.2 39.5 45.3 18.7542 6.7497
34 Bream 975.0 37.4 41.0 45.9 18.6354 6.7473
35 Bream 950.0 38.0 41.0 46.5 17.6235 6.3705
36 Roach 40.0 12.9 14.1 16.2 4.1472 2.2680
37 Roach 69.0 16.5 18.2 20.3 5.2983 2.8217
38 Roach 78.0 17.5 18.8 21.2 5.5756 2.9044
39 Roach 87.0 18.2 19.8 22.2 5.6166 3.1746
40 Roach 120.0 18.6 20.0 22.2 6.2160 3.5742
41 Roach 0.0 19.0 20.5 22.8 6.4752 3.3516
42 Roach 110.0 19.1 20.8 23.1 6.1677 3.3957
43 Roach 120.0 19.4 21.0 23.7 6.1146 3.2943
44 Roach 150.0 20.4 22.0 24.7 5.8045 3.7544
45 Roach 145.0 20.5 22.0 24.3 6.6339 3.5478
46 Roach 160.0 20.5 22.5 25.3 7.0334 3.8203
47 Roach 140.0 21.0 22.5 25.0 6.5500 3.3250
48 Roach 160.0 21.1 22.5 25.0 6.4000 3.8000
49 Roach 169.0 22.0 24.0 27.2 7.5344 3.8352
50 Roach 161.0 22.0 23.4 26.7 6.9153 3.6312
51 Roach 200.0 22.1 23.5 26.8 7.3968 4.1272
52 Roach 180.0 23.6 25.2 27.9 7.0866 3.9060
53 Roach 290.0 24.0 26.0 29.2 8.8768 4.4968
54 Roach 272.0 25.0 27.0 30.6 8.5680 4.7736
55 Roach 390.0 29.5 31.7 35.0 9.4850 5.3550
56 Whitefish 270.0 23.6 26.0 28.7 8.3804 4.2476
57 Whitefish 270.0 24.1 26.5 29.3 8.1454 4.2485
58 Whitefish 306.0 25.6 28.0 30.8 8.7780 4.6816
59 Whitefish 540.0 28.5 31.0 34.0 10.7440 6.5620
60 Whitefish 800.0 33.7 36.4 39.6 11.7612 6.5736
61 Whitefish 1000.0 37.3 40.0 43.5 12.3540 6.5250
62 Parkki 55.0 13.5 14.7 16.5 6.8475 2.3265
63 Parkki 60.0 14.3 15.5 17.4 6.5772 2.3142
64 Parkki 90.0 16.3 17.7 19.8 7.4052 2.6730
65 Parkki 120.0 17.5 19.0 21.3 8.3922 2.9181
66 Parkki 150.0 18.4 20.0 22.4 8.8928 3.2928
67 Parkki 140.0 19.0 20.7 23.2 8.5376 3.2944
68 Parkki 170.0 19.0 20.7 23.2 9.3960 3.4104
69 Parkki 145.0 19.8 21.5 24.1 9.7364 3.1571
70 Parkki 200.0 21.2 23.0 25.8 10.3458 3.6636
71 Parkki 273.0 23.0 25.0 28.0 11.0880 4.1440
72 Parkki 300.0 24.0 26.0 29.0 11.3680 4.2340
73 Perch 5.9 7.5 8.4 8.8 2.1120 1.4080
74 Perch 32.0 12.5 13.7 14.7 3.5280 1.9992
75 Perch 40.0 13.8 15.0 16.0 3.8240 2.4320
76 Perch 51.5 15.0 16.2 17.2 4.5924 2.6316
77 Perch 70.0 15.7 17.4 18.5 4.5880 2.9415
78 Perch 100.0 16.2 18.0 19.2 5.2224 3.3216
79 Perch 78.0 16.8 18.7 19.4 5.1992 3.1234
80 Perch 80.0 17.2 19.0 20.2 5.6358 3.0502
81 Perch 85.0 17.8 19.6 20.8 5.1376 3.0368
82 Perch 85.0 18.2 20.0 21.0 5.0820 2.7720
83 Perch 110.0 19.0 21.0 22.5 5.6925 3.5550
84 Perch 115.0 19.0 21.0 22.5 5.9175 3.3075
85 Perch 125.0 19.0 21.0 22.5 5.6925 3.6675
86 Perch 130.0 19.3 21.3 22.8 6.3840 3.5340
87 Perch 120.0 20.0 22.0 23.5 6.1100 3.4075
88 Perch 120.0 20.0 22.0 23.5 5.6400 3.5250
89 Perch 130.0 20.0 22.0 23.5 6.1100 3.5250
90 Perch 135.0 20.0 22.0 23.5 5.8750 3.5250
91 Perch 110.0 20.0 22.0 23.5 5.5225 3.9950
92 Perch 130.0 20.5 22.5 24.0 5.8560 3.6240
93 Perch 150.0 20.5 22.5 24.0 6.7920 3.6240
94 Perch 145.0 20.7 22.7 24.2 5.9532 3.6300
95 Perch 150.0 21.0 23.0 24.5 5.2185 3.6260
96 Perch 170.0 21.5 23.5 25.0 6.2750 3.7250
97 Perch 225.0 22.0 24.0 25.5 7.2930 3.7230
98 Perch 145.0 22.0 24.0 25.5 6.3750 3.8250
99 Perch 188.0 22.6 24.6 26.2 6.7334 4.1658
100 Perch 180.0 23.0 25.0 26.5 6.4395 3.6835
101 Perch 197.0 23.5 25.6 27.0 6.5610 4.2390
102 Perch 218.0 25.0 26.5 28.0 7.1680 4.1440
103 Perch 300.0 25.2 27.3 28.7 8.3230 5.1373
104 Perch 260.0 25.4 27.5 28.9 7.1672 4.3350
105 Perch 265.0 25.4 27.5 28.9 7.0516 4.3350
106 Perch 250.0 25.4 27.5 28.9 7.2828 4.5662
107 Perch 250.0 25.9 28.0 29.4 7.8204 4.2042
108 Perch 300.0 26.9 28.7 30.1 7.5852 4.6354
109 Perch 320.0 27.8 30.0 31.6 7.6156 4.7716
110 Perch 514.0 30.5 32.8 34.0 10.0300 6.0180
111 Perch 556.0 32.0 34.5 36.5 10.2565 6.3875
112 Perch 840.0 32.5 35.0 37.3 11.4884 7.7957
113 Perch 685.0 34.0 36.5 39.0 10.8810 6.8640
114 Perch 700.0 34.0 36.0 38.3 10.6091 6.7408
115 Perch 700.0 34.5 37.0 39.4 10.8350 6.2646
116 Perch 690.0 34.6 37.0 39.3 10.5717 6.3666
117 Perch 900.0 36.5 39.0 41.4 11.1366 7.4934
118 Perch 650.0 36.5 39.0 41.4 11.1366 6.0030
119 Perch 820.0 36.6 39.0 41.3 12.4313 7.3514
120 Perch 850.0 36.9 40.0 42.3 11.9286 7.1064
121 Perch 900.0 37.0 40.0 42.5 11.7300 7.2250
122 Perch 1015.0 37.0 40.0 42.4 12.3808 7.4624
123 Perch 820.0 37.1 40.0 42.5 11.1350 6.6300
124 Perch 1100.0 39.0 42.0 44.6 12.8002 6.8684
125 Perch 1000.0 39.8 43.0 45.2 11.9328 7.2772
126 Perch 1100.0 40.1 43.0 45.5 12.5125 7.4165
127 Perch 1000.0 40.2 43.5 46.0 12.6040 8.1420
128 Perch 1000.0 41.1 44.0 46.6 12.4888 7.5958
129 Pike 200.0 30.0 32.3 34.8 5.5680 3.3756
130 Pike 300.0 31.7 34.0 37.8 5.7078 4.1580
131 Pike 300.0 32.7 35.0 38.8 5.9364 4.3844
132 Pike 300.0 34.8 37.3 39.8 6.2884 4.0198
133 Pike 430.0 35.5 38.0 40.5 7.2900 4.5765
134 Pike 345.0 36.0 38.5 41.0 6.3960 3.9770
135 Pike 456.0 40.0 42.5 45.5 7.2800 4.3225
136 Pike 510.0 40.0 42.5 45.5 6.8250 4.4590
137 Pike 540.0 40.1 43.0 45.8 7.7860 5.1296
138 Pike 500.0 42.0 45.0 48.0 6.9600 4.8960
139 Pike 567.0 43.2 46.0 48.7 7.7920 4.8700
140 Pike 770.0 44.8 48.0 51.2 7.6800 5.3760
141 Pike 950.0 48.3 51.7 55.1 8.9262 6.1712
142 Pike 1250.0 52.0 56.0 59.7 10.6863 6.9849
143 Pike 1600.0 56.0 60.0 64.0 9.6000 6.1440
144 Pike 1550.0 56.0 60.0 64.0 9.6000 6.1440
145 Pike 1650.0 59.0 63.4 68.0 10.8120 7.4800
146 Smelt 6.7 9.3 9.8 10.8 1.7388 1.0476
147 Smelt 7.5 10.0 10.5 11.6 1.9720 1.1600
148 Smelt 7.0 10.1 10.6 11.6 1.7284 1.1484
149 Smelt 9.7 10.4 11.0 12.0 2.1960 1.3800
150 Smelt 9.8 10.7 11.2 12.4 2.0832 1.2772
151 Smelt 8.7 10.8 11.3 12.6 1.9782 1.2852
152 Smelt 10.0 11.3 11.8 13.1 2.2139 1.2838
153 Smelt 9.9 11.3 11.8 13.1 2.2139 1.1659
154 Smelt 9.8 11.4 12.0 13.2 2.2044 1.1484
155 Smelt 12.2 11.5 12.2 13.4 2.0904 1.3936
156 Smelt 13.4 11.7 12.4 13.5 2.4300 1.2690
157 Smelt 12.2 12.1 13.0 13.8 2.2770 1.2558
158 Smelt 19.7 13.2 14.3 15.2 2.8728 2.0672
159 Smelt 19.9 13.8 15.0 16.2 2.9322 1.8792

Getting Data into SAS

There are four common ways to import data into SAS:

  1. Enter data directly into a SAS data file
  2. Create a SAS data file from another raw data file
  3. Convert other software's files into SAS data sets
  4. Reading other software's files directly

We will talk about method 1 and then focus on Libname statements within SAS OnDemand.

Creating SAS Data Files from raw data

Similar to constructing an R data frame, SAS data files can be created from raw data.

In [6]:
* Read raw data into SAS data file called skicounts;
DATA skicounts;
    INPUT NewSnow $ Day $ NumSkiers $;
    DATALINES;
28 SAT 5,200
13 SUN 6,300
4 MON 3,400
0 TUE 2,150
;
RUN;
Out[6]:

59   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
60
61 * Read raw data into SAS data file called skicounts;
62 DATA skicounts;
63 INPUT NewSnow $ Day $ NumSkiers $;
64 DATALINES;
NOTE: The data set WORK.SKICOUNTS has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

69 ;
70 RUN;
71 ods html5 close;ods listing;

72
In [7]:
PROC PRINT DATA=skicounts;
    TITLE 'Number of Skiers';
RUN;
Out[7]:
SAS Output

Number of Skiers

Obs NewSnow Day NumSkiers
1 28 SAT 5,200
2 13 SUN 6,300
3 4 MON 3,400
4 0 TUE 2,150

LIBNAME STATEMENTS

In SAS OnDemand we will get most our datasets from a libname that permits access to datasets uploaded by the course instructor.

A libname statement permits SAS users to access permanent SAS dataset stored on a computer or in our case on a server.

The syntax will look like:

LIBNAME STAT408 "/courses/d716b355ba27fe300";

SAS OnDemand

SAS OnDemand is a free SAS product for academic institutions.

  • SAS OnDemand is somewhat limited relative to a desktop version, but the desktop version can be very pricey.
  • SAS OnDemand can be accessed using a web browser, whereas the desktop version or SAS University (another free product) requires the installation of a virtual box.

SAS OnDemand Exercise

  1. Install and Open SAS OnDemand
  2. Include a libname statement: LIBNAME STAT408 "/courses/d716b355ba27fe300";
  3. Create a new dataset using the existing dataset STAT408.titanic, that only includes females.
  4. Print the resulting dataset.

SAS OnDemand Solution

LIBNAME STAT408 "/courses/d716b355ba27fe300";

Data titanic;

set STAT408.titanic;

if Sex = 'female';

RUN;

Proc Print data = titanic;

RUN;

SAS Informats and Formats

Informats

  • Informats are instructions for how the data is to be read by a computer
  • This allows users to read data in different types (for example data with comma's can be read as a numeric value rather than a character string)
  • Informats can be supplied by SAS or created by the user. We will focus exclusively on SAS defined informats.
  • Character informats have the following format \$informat name w. , where \$ designates a character variable and w specifies the width. Examples include:
    • \$16. (Character field of width 16, trims leading blanks
    • \$CHAR7. (Character field with width of seven, does not trim leading blanks)
    • \$UPCASE9. (Converts characters to uppercase)
  • Numeric and Datetime informats do not start with \$. Examples include:
    • best32. (generic numeric value of width 32)
    • COMMA9. (numeric values with commas with width of 9)
    • DATEw. (reads dates in form: ddmmmyy or ddmmmyyyy)
    • DDMMYYw. (reades dates in form: DDMMYY)
  • Dates are the number of days since January 1, 1960.

Formats

  • Formats are similar to informats, but rather than focusing on reading the data formats control how the data is displayed.
  • Formats are instructions to SAS on writing the data.
  • It may be computationally efficient to store binary data as '1' or '0', but when printing the data we'd prefer to see 'Passenger Survived' or 'Passenger did not Survive'. So in this example the format would be the character strings.
    • PROC IMPORT automatically specifies informat and formats.
In [10]:
DATA skicounts2;
    INPUT NewSnow $ Day $ NumSkiers $;
    DATALINES;
28 SAT 5,200
13 SUN 6,300
4 MON 3,400
0 TUE 2,150
;
RUN;
Out[10]:

129  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
130
131 DATA skicounts2;
132 INPUT NewSnow $ Day $ NumSkiers $;
133 DATALINES;
NOTE: The data set WORK.SKICOUNTS2 has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

138 ;
139 RUN;
140
141
142 ods html5 close;ods listing;

143
In [11]:
* Note that the previous title remains;
Proc Print Data=skicounts2;
run;
Out[11]:
SAS Output

Information about passengers on Titanic

Obs NewSnow Day NumSkiers
1 28 SAT 5,200
2 13 SUN 6,300
3 4 MON 3,400
4 0 TUE 2,150
In [12]:
PROC PRINT DATA = skicounts2;
TITLE;
RUN;
Out[12]:
SAS Output
Obs NewSnow Day NumSkiers
1 28 SAT 5,200
2 13 SUN 6,300
3 4 MON 3,400
4 0 TUE 2,150
In [13]:
Proc Contents data = skicounts2;
run;
Out[13]:
SAS Output

The CONTENTS Procedure

Data Set Name WORK.SKICOUNTS2 Observations 4
Member Type DATA Variables 3
Engine V9 Indexes 0
Created 10/17/2017 20:36:34 Observation Length 24
Last Modified 10/17/2017 20:36:34 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2714
Obs in First Data Page 4
Number of Data Set Repairs 0
Filename /tmp/SAS_work1C8900001930_hplc329.coe.montana.edu/skicounts2.sas7bdat
Release Created 9.0401M4
Host Created Linux
Inode Number 271771
Access Permission rw-r--r--
Owner Name sasdemo
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len
2 Day Char 8
1 NewSnow Char 8
3 NumSkiers Char 8
In [14]:
DATA skicounts2;
    INPUT NewSnow  Day $ NumSkiers comma5.;
    DATALINES;
28 SAT 5,200
13 SUN 6,300
4 MON 3,400
0 TUE 2,150
;
RUN;
Out[14]:

168  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
169
170 DATA skicounts2;
171 INPUT NewSnow Day $ NumSkiers comma5.;
172 DATALINES;
NOTE: The data set WORK.SKICOUNTS2 has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

177 ;
178 RUN;
179 ods html5 close;ods listing;

180
In [15]:
proc print data=skicounts2;
run;
Out[15]:
SAS Output
Obs NewSnow Day NumSkiers
1 28 SAT 5200
2 13 SUN 6300
3 4 MON 3400
4 0 TUE 2150
In [16]:
Proc Contents data = skicounts2;
run;
Out[16]:
SAS Output

The CONTENTS Procedure

Data Set Name WORK.SKICOUNTS2 Observations 4
Member Type DATA Variables 3
Engine V9 Indexes 0
Created 10/17/2017 20:36:42 Observation Length 24
Last Modified 10/17/2017 20:36:42 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2714
Obs in First Data Page 4
Number of Data Set Repairs 0
Filename /tmp/SAS_work1C8900001930_hplc329.coe.montana.edu/skicounts2.sas7bdat
Release Created 9.0401M4
Host Created Linux
Inode Number 271772
Access Permission rw-r--r--
Owner Name sasdemo
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len
2 Day Char 8
1 NewSnow Num 8
3 NumSkiers Num 8

Libname Statements and Temporary vs. Permanent Data

  • In R, when you create a data frame the data frame is not saved in your working directory. However, R data frames can be saved, using save() and loaded directly into R.
  • SAS is similar, in that the default is for data sets not to be directly saved.
  • However, as in R, data sets can be saved, but this requires specifying a Libname.

Libname statements are specifed by the following recipe: LIBNAME libref 'location of SAS data library'

In [17]:
* This is slightly different using SAS University due to the requirement of virtual box;
libname mydata '/folders/myfolders/';

* Note this will create a permanent SAS file in the above directory;
* SAS data files have the following extension .sas7bdat ;
DATA mydata.Housing;
    SET titanic;
RUN;
Out[17]:

196  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
197
198 * This is slightly different using SAS University due to the requirement of virtual box;
199 libname mydata '/folders/myfolders/';
NOTE: Libref MYDATA was successfully assigned as follows:
Engine: V9
Physical Name: /folders/myfolders
200
201 * Note this will create a permanent SAS file in the above directory;
202 * SAS data files have the following extension .sas7bdat ;
203 DATA mydata.Housing;
204 SET titanic;
205 RUN;
NOTE: There were 2000 observations read from the data set WORK.TITANIC.
NOTE: The data set MYDATA.HOUSING has 2000 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

206 ods html5 close;ods listing;

207

Working with Your Data

Variables can be created or redefined in the SAS DATA STEP.

In [18]:
DATA ski;
    set skicounts2;
    Mountain = 'Bridger Bowl';
    peracre = NumSkiers / 2000;
Run;
Out[18]:

209  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
210
211 DATA ski;
212 set skicounts2;
213 Mountain = 'Bridger Bowl';
214 peracre = NumSkiers / 2000;
215 Run;
NOTE: There were 4 observations read from the data set WORK.SKICOUNTS2.
NOTE: The data set WORK.SKI has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

216 ods html5 close;ods listing;

217
In [19]:
PROC PRINT DATA=ski;
    TITLE 'Bridger Bowl Ski Facts';
RUN;
Out[19]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre
1 28 SAT 5200 Bridger Bowl 2.600
2 13 SUN 6300 Bridger Bowl 3.150
3 4 MON 3400 Bridger Bowl 1.700
4 0 TUE 2150 Bridger Bowl 1.075

Using SAS Functions

  • SAS has many built in functions for character, date-time, and numeric data.
  • A few character functions include:
    • LEFT() left aligns SAS character expression
    • LENGTH() returns the length of argument not including trailing blanks
    • SUBSTR(arg,position,n) Extracts a substring from an argument starating at position for n characters.
  • A few numeric functions include:
    • MAX()
    • MIN()
    • ROUND(arg, round-off-unit)
    • SUM()
  • A few data functions include:
    • DAY(date) - returns day of month
    • WEEKDAY(date) - returns day of week (1 = Sunday)
    • MDY(month,day,year) - returns a SAS date value from month, day, and year value.
In [20]:
DATA ski;
    set ski;
    DayAbbr = substr(DAY,1,2);
Run;
Out[20]:

227  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
228
229 DATA ski;
230 set ski;
231 DayAbbr = substr(DAY,1,2);
232 Run;
NOTE: There were 4 observations read from the data set WORK.SKI.
NOTE: The data set WORK.SKI has 4 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

233 ods html5 close;ods listing;

234
In [21]:
PROC PRINT DATA=ski;
run;
Out[21]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr
1 28 SAT 5200 Bridger Bowl 2.600 SA
2 13 SUN 6300 Bridger Bowl 3.150 SU
3 4 MON 3400 Bridger Bowl 1.700 MO
4 0 TUE 2150 Bridger Bowl 1.075 TU

SAS Exercise 2:

  1. Create a dataset containing the following variables:
    • First Name, Middle Name, Last Name
  2. Populate this dataset with you and ... (your family, friends, or roommates)
  3. Then use the substr function to create a second dataset that also contains a new variable that has each person's initials.
In [1]:
DATA names;
    INPUT FirstName $9. MiddleName $ LastName $ ;
    DATALINES;
Andrew    Blake   Hoegh
Eleanor   Larson  Hoegh
Georgiana Otelia  Hoegh
;
RUN;

Data namesUpdated;
    SET names;
    initials = CAT(SUBSTR(FirstName,1,1),SUBSTR(MiddleName,1,1),SUBSTR(LastName,1,1));
RUN;

PROC PRINT DATA=namesUpdated;
RUN;
Out[1]:
SAS Output

The SAS System

Obs FirstName MiddleName LastName initials
1 Andrew Blake Hoegh ABH
2 Eleanor Larson Hoegh ELH
3 Georgiana Otelia Hoegh GOH

Using IF-THEN Statements

If you want to assign some value to certain observations, but not others this is called conditional logic. We have done this in R with if and ifelse() functions.

In SAS the following syntax is used:

IF condition THEN action;

Conditional statements use the following operators:

Symbolic Mnemonic Meaning
= EQ equals
^= or ~= NE not equal
> GT greater than
< LT less than
>= GE greater than or equal
<= LE less than or equal
& AND all comparisons must be true
! OR only one comparison must be true

Similar to %in% R, we can also use in to test if a value is in a vector.

In [22]:
Data ski;
    SET Ski;
    If NewSnow > 0 THEN DayType ='Powder';
RUN;
Proc Print data= ski;
RUN;
Out[22]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType
1 28 SAT 5200 Bridger Bowl 2.600 SA Powder
2 13 SUN 6300 Bridger Bowl 3.150 SU Powder
3 4 MON 3400 Bridger Bowl 1.700 MO Powder
4 0 TUE 2150 Bridger Bowl 1.075 TU  

Multiple statements require a DO STATEMENT which has a corresponding END.

In [23]:
Data ski;
    SET Ski;
    If NewSnow > 0 THEN DO;
    DayType ='Powder';
    Crowds = 'Busy';
    END;
RUN;
Proc Print data= ski;
RUN;
Out[23]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType Crowds
1 28 SAT 5200 Bridger Bowl 2.600 SA Powder Busy
2 13 SUN 6300 Bridger Bowl 3.150 SU Powder Busy
3 4 MON 3400 Bridger Bowl 1.700 MO Powder Busy
4 0 TUE 2150 Bridger Bowl 1.075 TU    

IF -THEN/ELSE Statements

Similar to R, SAS also has the capability to handle if/else type of statments.

The format is:

IF condition THEN action;
    ELSE IF condition THEN action;
    ELSE action;
In [24]:
DATA ski;
    SET Ski;
    If DAY in ('SAT', 'SUN') THEN Weekend ='YES';
        ELSE Weekend = 'NO';
RUN;
Proc Print data= ski;
RUN;
Out[24]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType Crowds Weekend
1 28 SAT 5200 Bridger Bowl 2.600 SA Powder Busy YES
2 13 SUN 6300 Bridger Bowl 3.150 SU Powder Busy YES
3 4 MON 3400 Bridger Bowl 1.700 MO Powder Busy NO
4 0 TUE 2150 Bridger Bowl 1.075 TU     NO

Subsetting Your Data

If statements can also be used to subset your data. Either by: IF expression which only keeps observations satisfying this criteria or IF expression THEN DELETE which deletes observations that satisfy that expression. ```

In [25]:
DATA ski_wkend;
    SET Ski;
    If Weekend ='YES';
RUN;
Proc Print data= ski_wkend;
RUN;
Out[25]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType Crowds Weekend
1 28 SAT 5200 Bridger Bowl 2.60 SA Powder Busy YES
2 13 SUN 6300 Bridger Bowl 3.15 SU Powder Busy YES

Sorting, Printing, Summarizing Data Sets

SAS also permits WHERE statements for inside a PROC STATEMENT.

In [26]:
PROC PRINT data= ski;
    WHERE Crowds ='Busy';
RUN;
Out[26]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType Crowds Weekend
1 28 SAT 5200 Bridger Bowl 2.60 SA Powder Busy YES
2 13 SUN 6300 Bridger Bowl 3.15 SU Powder Busy YES
3 4 MON 3400 Bridger Bowl 1.70 MO Powder Busy NO

PROC SORT

The syntax for PROC SORT is as follows:

PROC SORT DATA = DATA_IN OUT=DATA_OUT;
    BY VARIABLE1 (DESCENDING?) VARIABLE 2; * default is to sort in ascending order;
RUN;
In [27]:
PROC SORT DATA=SKI OUT=SKI_SORT;
    BY NewSnow peracre;
RUN;

PROC PRINT DATA=SKI_SORT;
RUN;
Out[27]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow Day NumSkiers Mountain peracre DayAbbr DayType Crowds Weekend
1 0 TUE 2150 Bridger Bowl 1.075 TU     NO
2 4 MON 3400 Bridger Bowl 1.700 MO Powder Busy NO
3 13 SUN 6300 Bridger Bowl 3.150 SU Powder Busy YES
4 28 SAT 5200 Bridger Bowl 2.600 SA Powder Busy YES

More on PROC PRINT

We have seen several instances of PROC PRINT, to see complete options for this procedure visit the SAS help here. Note there is a link in the first slides for searching SAS PROCEDURES.

Some options include:

  • BY VARIABLE-LIST starts a new section in the output for each level of the variable. The data must be pre-sorted by that variable.
  • VAR VARIABLE-LIST specifies which variables to print and the order.
In [28]:
PROC SORT DATA= SKI OUT=SKI_SORT;
    BY Weekend;
RUN;

PROC PRINT DATA=SKI_SORT;
    BY Weekend;
    VAR NewSnow NumSkiers Weekend;
RUN;
Out[28]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow NumSkiers Weekend
1 4 3400 NO
2 0 2150 NO
Obs NewSnow NumSkiers Weekend
3 28 5200 YES
4 13 6300 YES

Formats can also be specified in a PROC PRINT statement.

In [29]:
PROC PRINT DATA=SKI_SORT;
    BY Weekend;
    VAR NewSnow NumSkiers Weekend;
    FORMAT NumSkiers COMMA5.;
RUN;
Out[29]:
SAS Output

Bridger Bowl Ski Facts

Obs NewSnow NumSkiers Weekend
1 4 3,400 NO
2 0 2,150 NO
Obs NewSnow NumSkiers Weekend
3 28 5,200 YES
4 13 6,300 YES

Summarizing DATA with PROC MEANS

The PROC MEANS procedure has several options, many of which we will touch on next week.

The procedure prints specified data summaries.

In [30]:
PROC MEANS DATA=SKI_SORT;
    BY Weekend;
    VAR NewSnow NumSkiers;
RUN;
Out[30]:
SAS Output

Bridger Bowl Ski Facts

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
NewSnow
NumSkiers
2
2
2.0000000
2775.00
2.8284271
883.8834765
0
2150.00
4.0000000
3400.00
Variable N Mean Std Dev Minimum Maximum
NewSnow
NumSkiers
2
2
20.5000000
5750.00
10.6066017
777.8174593
13.0000000
5200.00
28.0000000
6300.00
In [ ]: