Base R Graphics

In Lab 1, we learned a few functions for plotting data in base R ("base R" is the term we use to describe the R program when we haven't loaded any additional libraries):

plot
hist
boxplot

Let's review these functions here using Current Population Survey (CPS) data. These particular data consist of a random sample of 534 people from the CPS in 1985, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. Variables in the data set are described below.

Variable

Description

educ

Number of years of education

south

Indicator variable for living in a southern region: 
S = lives in south, NS = does not live in south

sex

Gender: M = male, F = female

exper

Number of years of work experience (inferred from age and education)

union

Indicator variable for union membership: Union or Not

wage

Wage (dollars per hour)

age

Age (years)

race

Race: W = white, NW = not white

sector

Sector of the economy: clerical, const (construction), management, 
manufacturing, professional, sales, service, other

married

Marital status: Married or Single

 

Load these data into your R session by running the following command.

CPS <- read.csv("http://math.montana.edu/shancock/data/cps.csv")

Exercise

Practice the base R graphics functions by answering the following questions:

  1. Is there an association between number of years of education and wage?
  2. Is there an association between age and union membership?
  3. Do men make more than women?

Use both plots and summary statistics to investigate these questions.

Data Visualization with ggplot2

R has numerous packages (libraries) for data visualization and graphics beyond what is available in base R. One of the more popular packages is ggplot2. Since there already exist excellent tutorials in using ggplot2, we will outsource this portion of the lab to Garrett Grolemund and Hadley Wickham: work through their tutorial on data visualization (which is Chapter 3 of their book, R for Data Science).