Stat 505 Assignment 1

Due Sept 9, 2011
Use any font for comments and explanations, but use courier (or other fixed-width) font for computer input and output. Insert plots near your discussion, not all at the end of the assignment. Don't let R print significance stars.
  1. Show R code and output for this exercise. Create and print a sequence
    1. of integers from 128 to 143. Use the length function to count how many there are.
    2. of integers from 100 to 80. Again provide the vector length.
    3. of nine equally spaced real numbers from 0 to π. Note pi is predefined in R. Which of these have sin zero? Which have cos of 0? Computers have trouble representing very large numbers and numbers very close to zero. You might want to use zapsmall(cos(my.sequence)) to ignore imprecision in machine accuracy close to 0.
  2. Read in the data set of kids feet sizes from http://www.amstat.org/publications/jse/datasets/kidsfeet.dat. Another page describes the variables.
    1. Assign names to the columns and print the summary of the dataframe.
    2. The author wonders whether the relationship between length and width is the same for boys and girls. Provide two scatterplots, one for boys, one for girls, or one plot with different plotting characters (pch) or colors to distiguish the sexs. Discuss the relationships.
    3. Fit an appropriate linear model and obtain a 95% CI for the difference in slope between males and females. Do you prefer a model with separate slopes or a single slope?
  3. Another dataset at the JSE site is in .xls format. Download it, open it in a spreadsheet, and save it as comma separated values. Descriptive info is here.
    1. Provide a plot which shows how the distribution of Price varies with Type.
    2. To what degree is Price affected by Mileage? Provide a plot and discuss.
    3. This is artificial data in that it prices are not what anyone paid, they are what Kelley Blue Book predicted as selling price. Compare proportions of cars with Leather among the 5 Types with a plot. Do you see any anomaly?
  4. Superbowl XLVI is scheduled to be played next February 5 in Indianapolis. I've pulled together some data about previous Superbowls so that you can impress friends with your recall. However, the data file needs some work before you can read it into R.
    1. Examine this file and figure out what problems prevent R from reading it in directly. Explain what changes you make so that it is readable. (Reminder: do not edit it in Word. Use a text editor like Wordpad or emacs.)
    2. Read the data into an R dataframe.
    3. Create more columns to show the total score and the winning margin.
    4. Include a summary of the data frame.
    5. Is there a trend to the total score over time?
    6. Point spread is of interest to odds makers. Plot and summarize its distribution.
    7. Is there a trend to the point spread over time?
    8. The most common scores in American football are 7, 6, and 3. Make a table (and a plot) of the scores modulo seven (the remainder after dividing by 7. In R that's score %% 7. Discuss your findings.