--- title: | | STAT 408 | Data Scraping and SQL date: "March 8, 2018" output: beamer_presentation: theme: "PaloAlto" fonttheme: "structuresmallcapsserif" --- ```{r setup, include=FALSE} library(dplyr) library(knitr) library(rvest) library(stringr) knitr::opts_chunk$set(echo = TRUE) knitr::knit_hooks$set(mysize = function(before, options, envir) { if (before) return(options$size) }) ``` # Data Scraping ## Data Scraping Data scraping is defined as using a computer to extract information, typically from human readable websites. We could spend multiple weeks on this, so this will be a basic introduction that will allow you to: - extract text and numbers from webpages and - extract tables from webpages. ## A bit about HTML HTML elements are written with a start tag, an end tag, and with the content in between: content. The tags which typically contain the textual content we wish to scrape. Some tags include: - $

$, $

$,…,: for headings - $

$: Paragraph elements - $