Stat 408 Frequency Tables

Regression is used to look for a relationship between two quantitative variables.
ANOVA is used to examine effects of a categorical predictor on a continuous response.
When we have two categorical variables, we can look for association with a mosaic plot. How do we quantify such relationships?
PROC FREQ builds tables for categorical variables and does tests to see if: Help pages for PROC FREQ.
Example from Little Sas Book, section 4.11 p 120.
Summarize the relationship between coffee order (Kona drip, espresso, cappuccino, or iced coffee) and ordering method (drive-up or walk-in).
DATA orders;
   INPUT Coffee $ Window $ @@;
datalines;
esp w cap d cap w kon w ice w kon d esp d kon w ice d esp d
cap w esp d cap d Kon d .   d kon w esp d cap w ice w kon w
kon w kon w ice d esp d kon w esp d esp w kon w cap w kon w
;
PROC FREQ DATA = orders;    * Print tables for Window and Window by Coffee;
   TABLES Window;   
   TABLES  Window * Coffee/ CHISQ CELLCHI2;  
RUN;
                                The FREQ Procedure
                            Table of Window by Coffee
 Window          Coffee
 Frequency      |
 Cell Chi-Square|
 Percent        |
 Row Pct/Col PCT|Kon     |cap     |esp     |ice     |kon     |  Total
 ---------------+--------+--------+--------+--------+--------+
 d              |      1 |      2 |      6 |      2 |      1 |     12
                | 0.8305 | 0.0939 | 2.1853 | 0.0718 | 2.3796 |
                |   3.45 |   6.90 |  20.69 |   6.90 |   3.45 |  41.38
                |   8.33 |  16.67 |  50.00 |  16.67 |   8.33 |
                | 100.00 |  33.33 |  75.00 |  50.00 |  10.00 |
 ---------------+--------+--------+--------+--------+--------+
 w              |      0 |      4 |      2 |      2 |      9 |     17
                | 0.5862 | 0.0663 | 1.5426 | 0.0507 | 1.6797 |
                |   0.00 |  13.79 |   6.90 |   6.90 |  31.03 |  58.62
                |   0.00 |  23.53 |  11.76 |  11.76 |  52.94 |
                |   0.00 |  66.67 |  25.00 |  50.00 |  90.00 |
 ---------------+--------+--------+--------+--------+--------+
 Total                 1        6        8        4       10       29
                    3.45    20.69    27.59    13.79    34.48   100.00
                               Frequency Missing = 1
                  Statistics for Table of Window by Coffee
              Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     4      9.4866    0.0500
             Likelihood Ratio Chi-Square    4     10.6538    0.0307
             Mantel-Haenszel Chi-Square     1      3.8624    0.0494
             Phi Coefficient                       0.5719
             Contingency Coefficient               0.4965
             Cramer's V                            0.5719
               WARNING: 90% of the cells have expected counts less
                       than 5. Chi-Square may not be a valid test.
                           Effective Sample Size = 29
                              Frequency Missing = 1
*************************************************************************
************   Fix a mistake and ask for the EXACT test   ***************;

ods graphics on / IMAGEFMT = png  IMAGENAME = "CoffeeOrders" height =4in width=5in;
PROC FREQ data = orders;  
     tables  Window * Coffee/ Fisher plots=freqplot(type=dot scale=percent);  
RUN;
ods graphics off;

      Window          Coffee
      Frequency      |
      Cell Chi-Square|
      Percent        |
      Row Pct        |
      Col Pct        |cap     |esp     |ice     |kon     |  Total
      ---------------+--------+--------+--------+--------+
      d              |      2 |      6 |      2 |      2 |     12
                     | 0.0939 | 2.1853 | 0.0718 | 1.4305 |
                     |   6.90 |  20.69 |   6.90 |   6.90 |  41.38
                     |  16.67 |  50.00 |  16.67 |  16.67 |
                     |  33.33 |  75.00 |  50.00 |  18.18 |
      ---------------+--------+--------+--------+--------+
      w              |      4 |      2 |      2 |      9 |     17
                     | 0.0663 | 1.5426 | 0.0507 | 1.0098 |
                     |  13.79 |   6.90 |   6.90 |  31.03 |  58.62
                     |  23.53 |  11.76 |  11.76 |  52.94 |
                     |  66.67 |  25.00 |  50.00 |  81.82 |
      ---------------+--------+--------+--------+--------+
      Total                 6        8        4       11       29
                        20.69    27.59    13.79    37.93   100.00
                         Fisher's Exact Test
                  ----------------------------------
                  Table Probability (P)       0.0027
                  Pr <= P                     0.0962
                       Effective Sample Size = 29
                         Frequency Missing = 1

What did we test? What do we conclude? Which cells are "unusual"?

In R:

temp <- scan(what="A")
esp w cap d cap w kon w ice w kon d esp d kon w ice d esp d
cap w esp d cap d Kon d .   d kon w esp d cap w ice w kon w
kon w kon w ice d esp d kon w esp d esp w kon w cap w kon w

nn <- length(temp)
temp[temp=="."] <- NA
temp[temp=="Kon"] <- "kon"
coffee <- data.frame( order = temp[seq(1, nn, 2)],
                      station = temp[seq(2, nn, 2)])
coffeeTable <- with(coffee, table( station, order))
dimnames(coffeeTable) <- list( c("Driveup","Walkin"),
                               c("capaccino","expresso","iced","kona"))
plot(coffeeTable)
summary(coffeeTable)
fisher.test(coffeeTable)

In a double blind experiment, children where given either Echinacea purpurea or a placebo. When the child next had an upper respiratory Infection (URI) a parent rated it as either mild, moderate or severe. Did Echinacea decrease severity?
 
DATA uri;
     input assess$ treat$ count;
     datalines;
mild echin 153
mild placebo 170
moderate echin 128
moderate placebo 157
severe echin 48
severe placebo 40
;
ods graphics on / IMAGEFMT = png  IMAGENAME = "echinacea" height =4in width=4in;
PROC freq data = uri;
     weight  count;
     table assess * treat/ cellchi2 expected chisq nopercent
                        plots=freqplot(type=dot scale=percent);
run;
ods graphics off;

                              The FREQ Procedure
                           Table of assess by treat

                  assess          treat

                  Frequency      |
                  Expected       |
                  Cell Chi-Square|echin   |placebo |  Total
                  ---------------+--------+--------+
                  mild           |    153 |    170 |    323
                                 | 152.68 | 170.32 |
                                 | 0.0007 | 0.0006 |
                  ---------------+--------+--------+
                  moderate       |    128 |    157 |    285
                                 | 134.72 | 150.28 |
                                 | 0.3352 | 0.3005 |
                  ---------------+--------+--------+
                  severe         |     48 |     40 |     88
                                 | 41.598 | 46.402 |
                                 | 0.9854 | 0.8833 |
                  ---------------+--------+--------+
                  Total               329      367      696


                   Statistics for Table of assess by treat

            Statistic                     DF       Value      Prob
            ------------------------------------------------------
            Chi-Square                     2      2.5056    0.2857

What did we test? What do we conclude?
Do the same in R:
echinacea <- as.table(matrix(c(153, 170, 128, 157, 48, 40), 3, 2, byrow=TRUE))
dimnames(echinacea) <- list(c("mild","moderate","severe"),
	                   c("echinacea","placebo"))
summary(echinacea)
plot(t(echinacea))

Author: Jim Robison-Cox
Last Updated: