---
title: "STAT 436 / 536 - Lecture 16"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(fig.align= 'center')
knitr::opts_chunk$set(fig.height= 4)
knitr::opts_chunk$set(fig.width = 4)
library(tidyverse)
library(gridExtra)
library(readr)
library(ggfortify)
```


### Modeling Non-Stationary Time Series

- Many time series models are non-stationary. Recall a time series is stationary if  
\vfill
- One of our current strategies for non-stationary time series models is to 
\vfill
- When using a regression model, the interest shifts to the residuals. In particular, the residuals should satisfy stationarity.
\vfill
- We also have seen that differencing a random walk results in a stationary series. A random walk can be written as
$$x_t = x_{t-1} + w_t$$
and then the differenced series 
\vfill
\vfill

So why is stationarity important?
\vfill
- Stationarity is a 
\vfill
\vfill
- When stationarity is not present, differencing (as in the random walk), is used to try to obtain a resulting series that is stationary.
\vfill
\vfill

#### Diagnosing a non-stationary series

- Sometimes a non-stationary series can be diagnosed visually:

```{r, echo = F}
air.passengers <- data.frame(year = rep(c(1949:1960), each = 12), month = rep(1:12, 12), passengers = c(AirPassengers[-c(133:144)], rep(NA,12)))
air.passengers$strDates <- paste(air.passengers$month, '/15/',air.passengers$year, sep='')
air.passengers$date <- as.Date(air.passengers$strDates, "%m/%d/%Y")

label.Dates <- paste('Jan', 1949:1960, sep='')
break.dates <- as.Date(paste(1, '/15/',1949:1960, sep=''), "%m/%d/%Y")

ggplot(data=air.passengers, aes( x=date, y =passengers)) + geom_line() + geom_point() + ylim(0,700) + labs(title="Monthly Airline Passenger Count", y="Number of Passengers(thousands)") + scale_x_date(labels = label.Dates, breaks = break.dates) + theme(axis.text.x = element_text(angle = 90))
```

\vfill

- The characteristic equation can be used to determine if a series is stationary, assuming the parameter values are known.

\begin{eqnarray*}
x_t &=& x_{t-1} - \frac{1}{4}x_{t-2} + w_t\\
\left( \frac{1}{4}B^2 - B + 1 \right)x_t &=& w_t
\end{eqnarray*}
```{r}
polyroot(c(1, -1, .25))
```


\begin{eqnarray*}
x_t &=& x_{t-1} + w_t\\
\left(  B - 1 \right)x_t &=& w_t
\end{eqnarray*}
```{r}
polyroot(c(-1, 1))
```


\begin{eqnarray*}
x_t &=& \frac{1}{2}x_{t-1} + \frac{1}{2}x_{t-2} + w_t\\
\left( -\frac{1}{2}B^2 - \frac{1}{2}B + 1 \right)x_t &=& w_t
\end{eqnarray*}
```{r}
#polyroot()
```


\vfill

- Another way to diagnose stationarity is to use a unit root test. This is closely related to the idea of a random walk as a unit root corresponds to the solution of the polynomial equation of an AR 1 model.

\vfill
- The `arima.sim()` function will return an error if you try to simulate a non-stationary model.
```{r, error = T}
arima.sim(n = 100, list(ar = c(1)))
```

\vfill

- The `Arima` and `auto.arima` can be used to assess for stationary with the a fitted model; however, there are restrictions in the model that generally  result in non-stationary models being fitted (with the integrated piece).
```{r, fig.height=4, fig.width=6}
library(forecast)
#autoplot(WWWusage)
plot(auto.arima(WWWusage, stationary = T))
```

\vfill
- Furthermore, the `auto.arima` package will also select models that "integrate" the data to create a differenced series that will be stationary.
```{r}
auto.arima(WWWusage)
```
\vfill

### Integrated Model


\vfill

- A model is 'integrated' with order d, denoted I(d), 


When $d =1$ this is a random walk.
```{r}
rw <- arima.sim(n=500, list( order = c(0,1,0)))
autoplot(rw)
auto.arima(rw)
```
\vfill

- The integrated component can also be combined with ARMA models to form an ARIMA.
$$\theta_p(B)(1-B)^d x_t = \phi_q(B) w_t$$
is an ARIMA(p, d, q) model.
\vfill
- For instance,
\begin{eqnarray*}
x_t &=& \alpha x_{t-1} + x_{t-1} - \alpha x_{t-2} + w_t + \beta w_{t-1}\\
\end{eqnarray*}
\vfill
\vfill
\vfill

 - Recall the taxi data set. Run the code below and discuss the results.

```{r, eval = F}
taxi.rides <- read_csv('http://math.montana.edu/ahoegh/teaching/timeseries/data/taxi.csv')

taxirides.diff <- taxi.rides %>% arrange(year, month, day) %>% slice(-c(1:4)) %>% 
  mutate(week.numb = rep(1:234, each = 7)) %>% group_by(week.numb) %>% 
  summarize(total.rides = sum(n)) %>% select(total.rides) %>% pull() %>% diff()

auto.arima(taxirides.diff)
```

```{r, eval = F}
taxirides.summary <- taxi.rides %>% arrange(year, month, day) %>% slice(-c(1:4)) %>% 
  mutate(week.numb = rep(1:234, each = 7)) %>% group_by(week.numb) %>% 
  summarize(total.rides = sum(n)) %>% select(total.rides) %>% pull()  
auto.arima(taxirides.summary)
```
\vfill
- The `arima` function can also be used to fit a specific order of an ARIMA model.
\vfill
- As we saw before with ARMA models, AR, ARI, IMA, and ARMA models are all special cases of the ARIMA framework.
\vfill

#### Seasonal Arima

- ARIMA models can also have a seasonal component, where the lag corresponds to the seasonal frequency.  For example, consider the following model for a time series with weekly seasonal frequency:

\vfill
\vfill

```{r}
library(forecast)
bakery.sales <- read_csv('http://math.montana.edu/ahoegh/teaching/timeseries/data/BreadBasket.csv')
pastry.count <- bakery.sales %>% filter(Item %in% c('Pastry','Scandinavian','Medialuna','Muffin','Scone')) %>% group_by(Date) %>% tally() %>% rename(num.pastry = n) %>% select(num.pastry) %>% pull() %>% ts(frequency =7)
ggtsdisplay(pastry.count)
auto.arima(pastry.count)
```