Stat 408 Assignment 1-4

Due February 6, 2013
This csv file contains records on 7,439 Rainbow and 4,399 Brown trout caught by FWP personnel in the Ruby river at four locations (Canyon, Greenhorn, Vigilante, ThreeForks) from 1994 to 2007. We need to tabulate counts of the fish in a table which will tell us: species, year, site, length class (50-74, 75-99, 100-125, ...) in 25mm bins, and their capture status: captured in first pass, captured in second pass and unmarked, or captured in second pass and marked.
  1. Cleaning the data:
    1. Plot histograms of each length and (separately) of weight by year and species.
      require(lattice)
      histogram(~length|factor(year)*species, rubyFish) 
      
    2. Use tapply or by to obtain mean and SD for length and for weight at each year/species combination.
      round(with(rubyFish, tapply(length, list(species, year), mean)),0)
      
    3. What changes in data collection occurred over the course of the study? Fix them so the data are all comparable. (Hint: use ifelse so you don't have to split the data into "OK" and "not OK")
    4. Redo the plots and summaries and comment on distributions across years and species.
    5. Plot weight as a function of length separately for each species. Lattice xyplot would do this nicely, but then we wouldn't be able to identify unusual fish, so do it in two separate plots and click on each outlying point to see which row of data it comes from. Click the middle mouse button when you're done.
      par(mfrow=c(1,2))
      plot(weight ~ length , data =rubyFish, subset=species=="Brn")
      oddballBrns <- with(subset(rubyFish, species=="Brn"), identify(x=length,
      	    y = weight))
       subset(rubyFish, species=="Brn")[oddballBrns, ]
      
      Repeat the last three lines for the rainbows.
    6. What do you think we should do with the unusual fish you identified? Discuss plusses and minuses of at least 2 options. Can you tell that some are recorded wrong? If so fix the probable error, if not remove them.
    7. Use an ftable command to see how many fish were captured at each site/species/year combination.
       with(rubyFish, ftable(year,species,site))
      
  2. Subset the data to extract Rainbow trout. We will work only with them. How many fish are in this dataset? How many site/years of data are there?
  3. We want to estimate population size and capture probability for each subpopulation (by length class) at each site in each year. Build a new column using the cut function which indicates which 25mm bin each fish falls into.
    cut(length, seq(50,475,25))
    
  4. The web applet we saw in class used the Lincoln-Petersen estimator: n1* n2/m where n1 is the number caught and marked in the first pass, n2 are caught in the recapture, of which m were marked. We will use table to get these numbers for each site/year/length combo, but first we need a way to separate out the first only, second only, and both passes fish to count them. Use ifelse statements to build a new column (I'll call it type) which tells us if a given fish was caught in first pass (and possibly second), second pass with no mark, or second pass with a mark.
  5. Obtain a table of type by combinations of site, year, and length class.
     with(rubyRBT, table(interaction(site,year,lengthClass,drop=T),type))
    
  6. Use strsplit on the rownames to create a site variable, a year column, and a length column.
  7. Build another column to show the total number of fish caught in the second pass. Plot the numbers caught in first pass versus the numbers caught second pass. What do you see?
  8. Compute the Lincoln-Petersen estimate of abundance for each group. Sum them for each site/year and plot the sum over years using a different symbol and color for each site.