--- title: "STAT 436 / 536 - Lecture 10" date: October 5, 2018 output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(warning = FALSE) knitr::opts_chunk$set(message = FALSE) knitr::opts_chunk$set(fig.align= 'center') knitr::opts_chunk$set(fig.height= 3) knitr::opts_chunk$set(fig.width = 5) library(tidyverse) library(gridExtra) ``` ## Regression with Seasonality - We have seen the presence of seasonality in several of the datasets we have considered. This section focuses on regression techniques for seasonality. \vfill ##### Indicator variables for seasonality - One approach for seasonality \vfill - Assume we are looking at monthly data (e.g. airline passengers), \vfill - Write out a linear model for seasonality with no trend. $$x_t = s_t + z_t$$ \vfill \vfill \vfill - This model can be fit in R (for the airline passengers) using the following commands ```{r} data("AirPassengers") AirPassengers.df <- data.frame(season = as.factor(as.numeric(cycle(AirPassengers))), count = as.numeric(AirPassengers)) lm(count ~ season - 1, data = AirPassengers.df) ``` \vfill \newpage - Now consider the same framework with a trend too \vfill - *We can formulate this in an additive framework as* \vfill \vfill \vfill - This model can be fit in R (for the airline passengers) using the following commands ```{r} data("AirPassengers") AirPassengers.df <- data.frame(season = as.factor(as.numeric(cycle(AirPassengers))), count = as.numeric(AirPassengers), time = 1:length(AirPassengers)) lm(count ~ time + season - 1, data = AirPassengers.df) ``` \vfill - Or \vfill \vfill \vfill - We will discuss fitting this model shortly, but note that it can be acheived with a log transform. \newpage ##### Harmonic seasonal models - The indicator variables cause a stair-step like seasonal pattern where each season has a step function or a separate intercept. An alternative for a smooth seasonal pattern is to use sine and cosine functions. \vfill - A sine wave with frequency $f$, phase shift $\phi$, and amplitude $A$ can be written as $$A \sin(2 \pi ft + \phi)$$ ```{r, echo=T} A <- 3 f <- 4 phi <- 1 t <- seq(0,2, by=.01) f.t <- A * sin(2 * pi * f * t + phi ) plot(t, f.t, type='l') ``` \vfill - The representation above is not linear, as $\phi$ is in the sine function; however, this can be re-expressed as $$A \sin(2 \pi ft + \phi)$$ where $\alpha_s = A \cos(\phi)$ and $\alpha_c = A \sin(\phi).$ This representation is linear in the parameters $\alpha_s$ and $\alpha_c$ and standard techniques (OLS) can be used to estimate these parameters. \vfill - Using this framework for harmonics with a time series $\{x_t\}$ with $s$ seasons results in $[s/2]$ possible cycles, where $[\;\;]$ retains the integer. \vfill - This model can be written as $$x_t = m_t +\sum_{i=1}^{[s/2]}\left(s_i \sin(2\pi i t / s) + c_i \cos(2 \pi i t / s) \right) + z_t$$ where \vfill \vfill \newpage - For instance consider the two curves below ```{r, fig.width=7, fig.height=5, echo = F} par(mfcol=c(1,2)) t <- seq(1,12, length.out = 1000) plot(t, sin(2 * pi * t / 12), type='l', ylab='') plot(t, sin(2 * pi * t / 12) + 0.2 * sin(2 * pi * 2 * t / 12) + 0.1 * sin(2 * pi * 4 * t / 12) + 0.1 * cos(2 * pi * 4 * t / 12), type='l', ylab='') ``` \vfill - The first curve is defined by $x_t =\sin(2 \pi t / 12)$. \vfill - The second curve has harmonic terms with frequencies at $\frac{1}{12}$, $\frac{2}{12}$ and $\frac{4}{12}$ and can be written as: \vfill \vfill \vfill \vfill - Revisiting the equation above, this would equate to: -$s_1 =$ -$c_1 =$ -$s_2 =$ -$c_2 =$ -$s_3 =$ -$c_3 =$ -$s_4 =$ -$c_4 =$ \vfill - Create a figure that contains the four separate harmonic curves (`par(mfcol=c(4,1))` is one way to do this) \newpage - Now we are going to build on the model from above such that \begin{eqnarray*} x_t &=& .05* t \\ &+&\sin(2 \pi (t) / 12) \\ &+& 0.2 \sin(2 \pi (2 t) /12) \\ &+& 0.1 \sin(2 \pi (4 t) /12) \\ &+& 0.1 \cos(2 \pi (4 t) / 12) \\ &+& w_t \end{eqnarray*} where $w_t \sim N(\mu=0,\sigma^2=1^2)$. Simulate a time series from this model with a total fo 240 time points. \vfill \vfill \newpage - The final step is to fit a time series model. Our goal here is learn the values for $s_i$ and $c_i$. To do so, we need to construct the sine and cosine components. These serve the same purpose as covariates typically denoted as $X$ in a simple regression model. ```{r, echo=F} t <- 1:240 mean.t <- .1 * t + sin(2 * pi * t / 12) + 0.2 * sin(2 * pi * 2 * t / 12) + 0.1 * sin(2 * pi * 4 * t / 12) + 0.1 * cos(2 * pi * 4 * t / 12) x = mean.t+rnorm(240,0,1) ``` ```{r} SIN <- COS <- matrix(nrow = length(t), ncol=4) # restricting to 4 components for (i in 1:4){ COS[,i] <- cos(2 * pi * i * t / 12) SIN[,i] <- sin(2 * pi * i * t / 12) } lm.harm <- lm(x ~ t + COS + SIN) summary(lm.harm) ```