Stat 408 Assignment 1-3

Due January 30, 2013
Show R code and output for each exercise. Use any font for comments and explanations, but use courier (or other fixed-width) font for computer input and output.
  1. Getting Help.
    1. What are the commonly used divisions of a help page that show up in bold type? List the object or function help pages (at least 3) you examined to create your list. (Note: help for datasets looks different than help for functions.)
    2. View help on warpbreaks
      1. Run the example plots and insert them here. Explain what the response is and give your evaluation of which wool type and which tension level gives, on average, the lowest number of breaks.
      2. The coplot help page also uses this data as an example for two plots. Run the second one and insert it here. How does this view of the data differ?
    3. Download the R reference card. Look at the second category -- Extensions -- and find three ways to search for help.
      1. Use the first to search for the word "box". Show what items are returned. What does the first function actually do? Provide an example (a plot and your code) which is slightly different from the example on the help page.
      2. Use the second to search for functions which contain the word "cook" in their help page. Which package(s) and function(s) did it find?
      3. Use the third to search for the word "Bonferroni". How many Function help pages and how many Vignettes contain that word? Also use the Gmane search to see how often it occurs in emails on the gmane.comp.lang.r.general list.
    4. Just below the R ref card link on the R Resources page is a link to "Quick R". Go to their website and explore their "Data Management" link. Find a function you were not already familiar with and explain what it does. (Look at it's help page as well as what Quick R says about it).
  2. Use the cars04 data from JSE is formatted in aligned columns as described here. Look up help on the read.fwf function (fixed width format). You will need to use the arguments strip.white and na.string. By trial and error, I figured out what works for the width string.
     cars04 <-
     read.fwf("URL.goes.here", strip.white =___,
             widths=c(45,2,2,2,2,2,2,2,7,7,4,4,4,3,3,4,4,4,4),  na.string=__ , 
             col.names= c("name","sporty","suv","wagon","minivan","pickup",
    	   "AllWD","RearWD","MSRPrice","Invoice","disp","cyl","horsepwr",
    	   "mpgCity","mpgHiway","weight","whlBase","length","width"))
    
    1. Note that columns 2 through 6 are indicator variables (0 or 1) to tell us the style. If they are all 0, then the vehicle is a sedan. Otherwise, the one column with a one in it indicates which type of car it is.
      Use subsetting or the ifelse function to create a factor with the six types as levels. Provide a summary of the types. Remove columns 2 through 6 using negative subscripting [Hint: -(2:6) means all but those indices]. Alternatively, you can set those columns to NULL and they will disappear. Show your code.
    2. Compare the distribution of MSRPrice for the six types with boxplots.
      1. Which cars cost over $100,000? How are they similar? (I would use subset to list cars that fit the condition).
      2. Which group has the smallest median? smallest spread?
      3. In each group, we have quite a lot of skewness. Check square root price and log price to see which is most symmetric. Load the lattice package and obtain juxtaposed quantile plots of the function of price you found to be most symmetric with a separate panel for each type of car. Comment: how well do they fit a normality assumption? Which has largest and which the smallest median? largest and smallest spreads?
    3. We want to look at the relationship between weight and mpgHiway.
      1. Use the xyplot function in the lattice package to plot mpgHiway as a function of weight with a panel for each type with a smoother drawn through the data cloud. There are some outliers. Which vehicles are they? What is the general pattern?
      2. In Europe, instead of keeping track of the fuel used per unit distance, they compute the number of liters needed to drive 100 km. Create another column to be the number of gallons needed to drive 100 miles. (100/mpgHiway). Plot it by weight with separate panel for each type of car and a smoother or regression line (whichever gives a better indication of the trend). Where are those outliers now? Comment on how the relationship has changed using the reciprocal.