---
title: "STAT 436 / 536 - Lecture 9"
date: October 3, 2018
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(fig.align= 'center')
knitr::opts_chunk$set(fig.height= 3)
knitr::opts_chunk$set(fig.width = 5)
library(tidyverse)
library(gridExtra)
```

## Linear Models and Regression

- With time series models, trends and seasonal patterns can be 
\vfill
- *Deterministic*
\vfill
- *Stochastic*
\vfill
- **Q:** how would forecasting differ based on stochastic and deterministic patterns? 
\vfill
- As we have seen time series analysis requires careful consideration due to the serial correlation present in the random error. The same idea holds for time series regression.
\vfill
\newpage

##### Linear Models
- A regression model for a time series data $\{x_t, \dots, x_1 \}$ is a linear model if it can be written as
\vfill
where $u_{i,t}$ is the value of the $i^th$ explanatory variable (or predictor or covariate) at time $t$, $z_t$ is the error term at time $t$, and $\alpha_0, \dots, \alpha_m$ are the model parameters (or regression covariates).
\vfill
- **Q:** are the following equations linear models?
$$x_t=\alpha_0 +\alpha_1 \log(u_{1,t}) +\alpha_2 u_{2,t} + z_t$$
$$x_t=\alpha_0 +\alpha_1 t +\alpha_2 t^2 + z_t$$
$$x_t=\alpha_0 +\alpha_1^2 u_{1,t} +  \alpha_2^4 u_{2,t} + z_t$$
\vfill
- The
\vfill
- The typical approach for fitting linear models is to use least squares. 
\vfill
- Consider the following linear model
$$x_t = \alpha_0 + \alpha_1 t + z_t$$
where $z_t$ has mean zero and follows a AR(1). Code for simulating from this model and one with indpendent errors is included next.
\vfill
```{r}
set.seed(09242018)
time.pts <- 50
z <- rep(0,time.pts)
t <- 1:time.pts
alpha.0 <- 0; alpha.1 <- 1; sigma.z <- 3; ar.coef <- .95
z[1] <- rnorm(1,0,sd=sigma.z)
for (t.val in 2:time.pts){
  z[t.val] <- ar.coef * z[t.val-1] + rnorm(1,0,sd=sigma.z)
}
x.corr <- alpha.0 + alpha.1 * t + z
x.ind <- alpha.0 + alpha.1 * t + rnorm(time.pts, 0 , sd = sigma.z)
df.corr <- data.frame(x=x.corr, t = t)
df.ind <- data.frame(x=x.ind, t = t)
```
\newpage
```{r, echo=F, fig.height=5}
grid.arrange(
  ggplot(data=df.corr, aes(y=x, x=t)) + geom_point() + geom_smooth(method='lm') + 
    ggtitle('Serial Correlated Error') + geom_line(alpha = .2),
  ggplot(data=df.ind, aes(y=x, x=t)) + geom_point() + geom_smooth(method='lm') + 
    ggtitle('Independent Error')+geom_line(alpha = .2))
```

- The `lm()` function in R can be used to fit regression models.
```{r}
lm.ind <- lm(x.ind ~ t)
summary(lm.ind)$coefficients
```
- For the independent errors,
\vfill

```{r}
lm.corr <- lm(x.corr ~ t)
summary(lm.corr)$coefficients
```
- With the correlated errors, 
\vfill
\newpage
- Diagnostics are in important part of the modeling process, to verify assumptions and make sure the models are "useful". Look at the ACF and PACF plots for the residuals in each setting. \vfill
```{r}
par(mfcol=c(1,2))
acf(resid(lm.ind)); pacf(resid(lm.ind))
```
\vfill
- **Q:** what do we make of the ACF and PACF figures in this case? How does that compare with the "known model"?
\vfill
```{r}
par(mfcol=c(1,2))
acf(resid(lm.corr)); pacf(resid(lm.corr))
```
\vfill
- **Q:** what do we make of the ACF and PACF figures in this case? How does that compare with the "known model"?
\vfill
\newpage
- To illustrate the effect of autocorrelation, consider the simple case of estimating a sample mean from a set of observations from a sample mean with $E[X_t] = \mu$ and $Var(X_t) = \sigma^2$. 
\vfill
- When the $X_i^S$ are independent, $Var(\bar{x}) =$
\vfill
- When $Cor(x_t, x_{t+k}) = \rho_k$, then $Var(\bar{x}) =$
\vfill
- Now reconsider the $CO_2$ dataset.
\vfill
```{r}
co2.annual.mean <- aggregate(co2, FUN = 'mean')
library(ggfortify)
autoplot(co2.annual.mean) + ylim(0, max(co2.annual.mean)) + 
  ggtitle('Annual Average CO2 Concentration at Mauna Loa') + ylab('ppm')
```
\vfill
- Fit and assess a model for this dataset.
\vfill


\vfill


\vfill

\vfill
\newpage    

- There is clear evidence of autocorrelation in the residuals.
\vfill
- So we cannot trust the standard errors on the model coefficients,
\vfill
- but what is the solution...
\vfill

##### Generalised Least Squares

- *Generalised least squares*
\vfill
- Recall the data we simulated earlier
```{r}
summary(lm.corr)
```
\newpage
```{r}
library(nlme)
x.corr.gls <- gls(x ~ t, data=df.corr, correlation = corAR1(ar.coef))
summary(x.corr.gls)

```

- Now use the GLS procedure to refit the model for $CO_2$.