--- title: "STAT 436 / 536 - Lecture 16" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(warning = FALSE) knitr::opts_chunk$set(message = FALSE) knitr::opts_chunk$set(fig.align= 'center') knitr::opts_chunk$set(fig.height= 4) knitr::opts_chunk$set(fig.width = 4) library(tidyverse) library(gridExtra) library(readr) library(ggfortify) ``` ### Modeling Non-Stationary Time Series - Many time series models are non-stationary. Recall a time series is stationary if \vfill - One of our current strategies for non-stationary time series models is to \vfill - When using a regression model, the interest shifts to the residuals. In particular, the residuals should satisfy stationarity. \vfill - We also have seen that differencing a random walk results in a stationary series. A random walk can be written as $$x_t = x_{t-1} + w_t$$ and then the differenced series \vfill \vfill So why is stationarity important? \vfill - Stationarity is a \vfill \vfill - When stationarity is not present, differencing (as in the random walk), is used to try to obtain a resulting series that is stationary. \vfill \vfill #### Diagnosing a non-stationary series - Sometimes a non-stationary series can be diagnosed visually: ```{r, echo = F} air.passengers <- data.frame(year = rep(c(1949:1960), each = 12), month = rep(1:12, 12), passengers = c(AirPassengers[-c(133:144)], rep(NA,12))) air.passengers$strDates <- paste(air.passengers$month, '/15/',air.passengers$year, sep='') air.passengers$date <- as.Date(air.passengers$strDates, "%m/%d/%Y") label.Dates <- paste('Jan', 1949:1960, sep='') break.dates <- as.Date(paste(1, '/15/',1949:1960, sep=''), "%m/%d/%Y") ggplot(data=air.passengers, aes( x=date, y =passengers)) + geom_line() + geom_point() + ylim(0,700) + labs(title="Monthly Airline Passenger Count", y="Number of Passengers(thousands)") + scale_x_date(labels = label.Dates, breaks = break.dates) + theme(axis.text.x = element_text(angle = 90)) ``` \vfill - The characteristic equation can be used to determine if a series is stationary, assuming the parameter values are known. \begin{eqnarray*} x_t &=& x_{t-1} - \frac{1}{4}x_{t-2} + w_t\\ \left( \frac{1}{4}B^2 - B + 1 \right)x_t &=& w_t \end{eqnarray*} ```{r} polyroot(c(1, -1, .25)) ``` \begin{eqnarray*} x_t &=& x_{t-1} + w_t\\ \left( B - 1 \right)x_t &=& w_t \end{eqnarray*} ```{r} polyroot(c(-1, 1)) ``` \begin{eqnarray*} x_t &=& \frac{1}{2}x_{t-1} + \frac{1}{2}x_{t-2} + w_t\\ \left( -\frac{1}{2}B^2 - \frac{1}{2}B + 1 \right)x_t &=& w_t \end{eqnarray*} ```{r} #polyroot() ``` \vfill - Another way to diagnose stationarity is to use a unit root test. This is closely related to the idea of a random walk as a unit root corresponds to the solution of the polynomial equation of an AR 1 model. \vfill - The `arima.sim()` function will return an error if you try to simulate a non-stationary model. ```{r, error = T} arima.sim(n = 100, list(ar = c(1))) ``` \vfill - The `Arima` and `auto.arima` can be used to assess for stationary with the a fitted model; however, there are restrictions in the model that generally result in non-stationary models being fitted (with the integrated piece). ```{r, fig.height=4, fig.width=6} library(forecast) #autoplot(WWWusage) plot(auto.arima(WWWusage, stationary = T)) ``` \vfill - Furthermore, the `auto.arima` package will also select models that "integrate" the data to create a differenced series that will be stationary. ```{r} auto.arima(WWWusage) ``` \vfill ### Integrated Model \vfill - A model is 'integrated' with order d, denoted I(d), When $d =1$ this is a random walk. ```{r} rw <- arima.sim(n=500, list( order = c(0,1,0))) autoplot(rw) auto.arima(rw) ``` \vfill - The integrated component can also be combined with ARMA models to form an ARIMA. $$\theta_p(B)(1-B)^d x_t = \phi_q(B) w_t$$ is an ARIMA(p, d, q) model. \vfill - For instance, \begin{eqnarray*} x_t &=& \alpha x_{t-1} + x_{t-1} - \alpha x_{t-2} + w_t + \beta w_{t-1}\\ \end{eqnarray*} \vfill \vfill \vfill - Recall the taxi data set. Run the code below and discuss the results. ```{r, eval = F} taxi.rides <- read_csv('http://math.montana.edu/ahoegh/teaching/timeseries/data/taxi.csv') taxirides.diff <- taxi.rides %>% arrange(year, month, day) %>% slice(-c(1:4)) %>% mutate(week.numb = rep(1:234, each = 7)) %>% group_by(week.numb) %>% summarize(total.rides = sum(n)) %>% select(total.rides) %>% pull() %>% diff() auto.arima(taxirides.diff) ``` ```{r, eval = F} taxirides.summary <- taxi.rides %>% arrange(year, month, day) %>% slice(-c(1:4)) %>% mutate(week.numb = rep(1:234, each = 7)) %>% group_by(week.numb) %>% summarize(total.rides = sum(n)) %>% select(total.rides) %>% pull() auto.arima(taxirides.summary) ``` \vfill - The `arima` function can also be used to fit a specific order of an ARIMA model. \vfill - As we saw before with ARMA models, AR, ARI, IMA, and ARMA models are all special cases of the ARIMA framework. \vfill #### Seasonal Arima - ARIMA models can also have a seasonal component, where the lag corresponds to the seasonal frequency. For example, consider the following model for a time series with weekly seasonal frequency: \vfill \vfill ```{r} library(forecast) bakery.sales <- read_csv('http://math.montana.edu/ahoegh/teaching/timeseries/data/BreadBasket.csv') pastry.count <- bakery.sales %>% filter(Item %in% c('Pastry','Scandinavian','Medialuna','Muffin','Scone')) %>% group_by(Date) %>% tally() %>% rename(num.pastry = n) %>% select(num.pastry) %>% pull() %>% ts(frequency =7) ggtsdisplay(pastry.count) auto.arima(pastry.count) ```