Use
the cars04
data from JSE is formatted in aligned columns as
described
here. Look up help on the read.fwf function (fixed width
format). You will need to use the arguments strip.white and
na.string. By trial and error, I figured out what works for the
width string.
cars04 <-
read.fwf("URL.goes.here", strip.white =___,
widths=c(45,2,2,2,2,2,2,2,7,7,4,4,4,3,3,4,4,4,4), na.string=__ ,
col.names= c("name","sporty","suv","wagon","minivan","pickup",
"AllWD","RearWD","MSRPrice","Invoice","disp","cyl","horsepwr",
"mpgCity","mpgHiway","weight","whlBase","length","width"))
- Note that columns 2 through 6 are indicator variables (0 or
1) to tell us the style. If they are all 0, then the vehicle is
a sedan. Otherwise, the one column with a one in it indicates
which type of car it is.
Use subsetting or the ifelse function to create a
factor with the six types as levels. Provide a summary of the
types.
Remove columns 2 through 6
using negative subscripting [Hint: -(2:6) means all but those
indices]. Alternatively, you can set those columns
to NULL and they will disappear. Show your code.
- Compare the distribution of MSRPrice for the six
types with boxplots.
- Which cars cost over $100,000? How are they similar? (I
would use subset to list cars that fit the condition).
- Which group has the smallest median? smallest spread?
- In each group, we have quite a lot of skewness. Check
square root price and log price to see which is most
symmetric. Load the lattice package and obtain juxtaposed
quantile plots of the function of price you found to be most
symmetric with a separate panel for each type of car.
Comment: how well do they fit a normality assumption? Which
has largest and which the smallest median? largest and
smallest spreads?
- We want to look at the relationship between weight and mpgHiway.
- Use the xyplot function in the lattice package to plot
mpgHiway as a
function of weight with a panel for each type with a smoother
drawn through the data cloud. There are some
outliers. Which vehicles are they? What is the general
pattern?
- In Europe, instead of keeping track of the fuel used per
unit distance, they compute the number of liters needed to
drive 100 km. Create another column to be the number of
gallons needed to drive 100 miles. (100/mpgHiway). Plot it by
weight with separate panel for each type of car and a smoother or
regression line (whichever gives a better indication of the
trend). Where are those outliers
now? Comment on how the relationship has changed using the
reciprocal.