--- title: "STAT 436 / 536 - Lecture 6" date: September 19, 2018 output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(warning = FALSE) knitr::opts_chunk$set(message = FALSE) knitr::opts_chunk$set(fig.align= 'center') knitr::opts_chunk$set(fig.height= 3) knitr::opts_chunk$set(fig.width = 5) ``` ### Forecasting Strategies - Exponential Smoothing Often with time series data, our objective is to make future predictions of a time series. Formally we are interested in a response at time $t+k$, $x_{t+k}$, given that we have observed the time series up to time point $t$, $\{x_1, \dots, x_t\}$. \vfill - Assume our process contains no systematic trend or seasonal effects, or that they have been removed. \vfill - \vfill \vfill - This type of model can be displayed ```{r, fig.align= 'center'} set.seed(09142018) mu.evol.sd <- .5 sigma <- 1 time.pts <- 100 mu <- rep(0,time.pts) x <- rep(0, time.pts) for (t in 2:time.pts){ mu[t] <- mu[t-1] + rnorm(1,mean=0, sd=mu.evol.sd) x[t] <- mu[t] + rnorm(1, mean=0, sd=sigma) } ts.df <- data.frame(mu=mu, x=x, time = 1:time.pts) library(ggplot2) ggplot(data=ts.df) + geom_line(aes(x = time, y = mu), color = "red") + geom_point(aes(x=time, y=x)) + ylab('') + ggtitle('Simulated Time Series') + labs(caption = "latent mean shown in red and observed points in black") ``` \newpage - Explore the impact of changing `mu.evol.sd` and `sigma` in the code above. What are the impacts of these two terms? \vfill - How might these two impact our idea on the next prediction? \vfill ########### Forecasting - \vfill - Interpret this estimator, what do we make of $\alpha$? \vfill - Does this estimator seem reasonable, yes or no? \vfill - How would $\alpha$ be influenced by the ratio of `mu.evol.sd` and `sigma`? \vfill - Given that there are no seasonal effects or trends, what is the prediction for $\hat{x}_{t+k} = \hat{\mu}_{t+k}$? \vfill - We can rewrite the model in a recursive manner, so that: $$\hat{\mu}_t = \alpha (x_t - \hat{\mu}_{t-1}) + \hat{\mu_t}_t-1$$ and then $$\hat{\mu}_t = \alpha x_t + \alpha(1-\alpha)x_{t-1} + \alpha(1-\alpha)^2 x_{t-2} \dots$$ - Typically \vfill - \vfill - So we still need to select $\alpha$, how should we do this? - what would be the optimal value of $\alpha$ if $\mu_{t+1} = \mu_t \forall t$ 1/t \vfill - what about if the change in $\mu$ from time to time is much larger than the variance associated with $\omega_t$? Then $\alpha$ should be closer to 1 \vfill \newpage - The book suggest a default value of $\alpha =$ \vfill \vfill - Let $e_t$ be the one step ahead prediction error, $e_t = x_t - \hat{x}_t = x_t - \hat{x}_t$. Then $\alpha$ can be estimated by minimizing the sum of squared one step ahead prediction error (SS1PE) in a similar fashion to a regression estimate. \vfill - \vfill \vfill ##### Air Quality - Recall the air quality measurements taken in Bozeman ```{r echo=F} library(rvest) library(dplyr) scrape_BZNPM_AUG <- function(days){ # Scrapes hourly PM2.5 readings in Bozeman for August 2018 # inputs: sequence of days # outputs: data frame that contains day, hour, and hourly average PM2.5 concentration. smoke.df <- data.frame(day = NULL, hour = NULL, conc = NULL) for (d in days){ bzn_aq <- read_html(paste("http://svc.mt.gov/deq/todaysair/AirDataDisplay.aspx?siteAcronym=BH&targetDate=8/",d,"/2018", sep='')) daily.smoke <- cbind(rep(d,24),html_table(bzn_aq)[[1]][,1:2]) colnames(daily.smoke) <- c('day','hour','conc') daily.smoke$conc[daily.smoke$conc == 'DU'] <- 'NA' daily.smoke$conc <- as.numeric(daily.smoke$conc) smoke.df <- bind_rows(smoke.df,daily.smoke) } return(smoke.df) } ``` \vfill ```{r} air.quality <- scrape_BZNPM_AUG(1:15) aq <- ts(air.quality$conc, freq = 24, start = 1) library(ggfortify) autoplot(aq) + ggtitle(label = 'PM2.5 Measurements in Bozeman, MT for August 1:15, 2018', subtitle = 'Source: MT DEQ') + ylab('µg/m3') + xlab('Day') ``` \vfill \newpage - There is not clear evidence of cycles or trends in this data set, so we will fit a model without them. ```{r} aq.hw <- HoltWinters(aq, beta=FALSE, gamma = FALSE, seasonal = 'additive') alpha.est <- aq.hw$alpha aq.pred <- predict(aq.hw, n.ahead = 24, prediction.interval = T) plot(aq.hw, aq.pred) ``` \vfill - The $\alpha$ term is estimated to be `r round(alpha.est,3)`. \vfill #### Holt-Winters Method \vfill - More generally the Holt-Winters method \vfill - Furthermore, these models can be expressed from an additive or multiplicative perspective. \vfill - To express this in an additive framework (3.21 in text) \vfill \vfill \vfill \vfill - In this framework the forecasting equation can be written as $$\hat{x}_{n+k|n} = a_n + k b_n + s_{n+k-p}$$ where we are making predictions at time $n$ for time $n+k$ and $p$ is the length of the seasonal cycle. \vfill \newpage - Consider the airline passenger data and a multiplicative decomposition. ```{r} data("AirPassengers") AP.hw <- HoltWinters(AirPassengers, seasonal = 'mult') AP.pred <- predict(AP.hw, n.ahead = 48, prediction.interval = T) plot(AP.hw,AP.pred) ``` \vfill - How do we feel about the prediction here? \vfill - What about if this model was used to predict all the way to today? \vfill