Stat 505 Assignment 1
Due Sept 9, 2011
Use any font for comments and explanations, but use
courier (or other fixed-width) font for computer input and output.
Insert plots near your discussion, not all at the end of the
assignment. Don't let R print significance stars.
- Show R code and output for this exercise. Create and print a sequence
- of integers from 128 to 143. Use the length function to
count how many there are.
- of integers from 100 to 80. Again provide the vector
length.
- of nine equally spaced real numbers from 0 to π.
Note pi is predefined in R. Which of these have sin
zero? Which have cos of 0? Computers have trouble representing
very large numbers and numbers very close to zero. You might
want to use zapsmall(cos(my.sequence)) to ignore
imprecision in machine accuracy close to 0.
-
Read in the data set of kids feet sizes from
http://www.amstat.org/publications/jse/datasets/kidsfeet.dat.
Another page describes the variables.
- Assign names to the columns and print the summary of the
dataframe.
- The author wonders whether the relationship between length
and width is the same for boys and girls. Provide two
scatterplots, one for boys, one for girls, or one plot with
different plotting characters (pch) or colors to distiguish the
sexs. Discuss the relationships.
- Fit an appropriate linear model and obtain a 95% CI for the
difference in slope between males and females. Do you prefer a
model with separate slopes or a single slope?
-
Another dataset at the JSE site is in .xls format. Download it,
open it in a spreadsheet, and save it as comma separated
values. Descriptive info
is here.
- Provide a plot which shows how the distribution of Price
varies with Type.
- To what degree is Price affected by Mileage? Provide a
plot and discuss.
- This is artificial data in that it prices are not
what anyone paid, they are what Kelley Blue Book predicted as
selling price. Compare proportions of cars with Leather among the 5 Types
with a plot. Do you see any anomaly?
- Superbowl XLVI is scheduled to be played next February 5 in
Indianapolis. I've pulled together some data about previous
Superbowls so that you can
impress friends with your recall. However, the data file needs some
work before you can read it into R.
- Examine this file and
figure out what problems prevent R from reading it in directly.
Explain what changes you make so that it is readable.
(Reminder: do not edit it in Word. Use a text editor like
Wordpad or emacs.)
- Read the data into an R dataframe.
- Create more columns to show the total score and the winning
margin.
- Include a summary of the data frame.
- Is there a trend to the total score over time?
- Point spread is of interest to odds makers. Plot and
summarize its distribution.
- Is there a trend to the point spread over time?
- The most common scores in American football are 7, 6, and
3. Make a table (and a plot) of the scores modulo seven (the
remainder after dividing by 7. In R that's score %%
7. Discuss your findings.
-