Warning: Programming is Frustrating

  • All statistical analysis requires the use of the computer
  • Computers do exactly what we tell them to do, not what we're thinking they should do
  • Lots of finicky little conventions must be memorized
  • Fun part is to get it to do new and beautiful things. There is a reward in the end. Computers are fast and accurate.

Computing Basics

  • Stay organized. Create a folder for STAT 408, subfolders as needed for notes, homework, …
  • We will work with .R, .sas, .Rmd code files
  • Data files: .csv, .txt, .sas7bdat
  • Know where files reside
  • Need to back up your work: Google Drive, Dropbox, montana.box.com

Programming in General

  • Plan ahead "Top-Down" programming
  • Programming is an iterative process
  • Reproducibility - code should make sense a year from now - Avoid programming with graphical interfaces - or save code run in background
    • We use command line interface
    • Use comments in code to explain what you are doing
    • Include code and comments in one big file

R is:

  • A programming environment
  • A way to run stat analyses
  • Built of functions and objects
  • Great at making complex plots (not necessarily easy)
  • A project involving work from hundreds of people
  • Rapidly expanding.

R is not:

  • A spreadsheet.
  • A database.
  • A place to enter data from the field directly.
  • A point-and-click environment.
  • A commercial product with professional support staff.

More about this course

This course provides an overview of statistical computation and graphical analysis. In particular, R and SAS will be introduced in this course.

Details

Course Objectives

At the completion of this course, students will:

  • become literate in statistical programming using R and SAS,
  • learn to effectively communicate through visual presentations of data, and
  • understand and imitate good programming practices.

Prereqs and Textbooks:

Prerequisite: One of STAT 217Q, STAT 332, STAT 401, or equivalent.

Textbooks (all free or optional):

  • ModernDive: An introduction to Statistical and Data Sciences via R, by Chester Ismay and Albert Kim. Free at http://moderndive.com
  • R for Data Science, by Hadley Wickham and Garret Grolemund. Free at http://r4ds.had.co.nz.
  • Visualize This: The FlowingData Guide to Design, Visualization, and Statistics, by Nathan Yau, 2011.
  • Art of R Programming: A Tour of Statistical Software Design, by Norman Matloff, 2011.
  • The Little SAS Book: A Primer, by Lora Delwiche and Susan Slaughter.

Additional Resources

Course Outline

The course will be taught from a partially flipped perspective. Tuesdays will be group labs which focus on implementing the programming concepts covered during the week. Video lectures focused on computing techniques will be watched outside of class.

The course outline follows as:

  • (5 weeks) R: Intro to R, R Studio, and R Markdown.
  • (6 weeks) Data visualization principles and advanced R: ggplot2 and R Shiny.
  • (4 weeks) SAS: data storage, manipulation, SAS procedures, and SAS macros.

Quizzes

Quizzes will be worth 15% of the final grade.

  • While there is not a formal attendance policy for this class, but there will be weekly quizzes on Thursdays.
  • There will be no makeup exam for missed quizzes, but the worst score will be excluded from final grades.

Homework

Homework will be worth 20% of the final grade.

  • Weekly homework will accompany course material. Some of the computational elements of the course will be presented as video lectures. Homework will typically be qualitative questions or short programming exercises.
  • Homework will be due prior to class on the assigned days. Homework will typically be collected and evaluated online through D2L.

Labs

Labs will be worth 25% of the final grade.

  • Labs will be in-class group assignments conducted every Thursday. The labs will have a large computational element.
  • The labs will be designed to be completed in 75 minutes; however, there may be times that groups need to finish labs outside of class time.

Midterm Exam

The midterm exam will be worth 20% of the final grade.

  • The midterm exam will have two parts: an in-class exam on March 1st and a take-home exam due on March 6th.

Final Exam

The final exam will be worth 20% of the final grade.

  • The final exam will also have two parts with the take-home portion due the day of the final exam period on May 1.

Introductions

I'll expect you to know all of your classmates names by the end of the year.

  • Name
  • Major/Minor
  • Why are you taking this course?
  • What was the best thing about your winter break?

Homework for Tuesday

Homework #1 is available on the course webpage.

  • Create a folder on your primary computer to store STAT408 materials.

  • Install R and R Studio (videos available on course webpage).

  • Create a RMarkdown document and answer the story problem specified on the course webpage. Turn in your .HTML output to D2L prior to class on Tuesday.