--- title: "STAT 436 / 536 - Lecture 4" date: September 10, 2018 output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(warning = FALSE) knitr::opts_chunk$set(message = FALSE) ``` ### 1. Correlation Structure and Motivation - We have seen how to decompose a time series model to remove a trend and seasonal components. So what remains? - \vfill - \vfill - \vfill ### 2. Expectation, Variance, and Auto Correlation - The expected value, or expectation, or a random variable is defined as: \vfill - In the context of annual measurements of Nile River flows, what is an interpretation of the expectation? \vfill - A times series model is stationary in the mean if: \vfill - What is $E[(x-\mu_x)(y-\mu_y)]$? - \vfill - \vfill \newpage ##### Sample Based Moments - Sample based calculations can be made in R mean using `mean(x)`, variance `var(x)`, covariance `cov(x,y)`, and correlation `cor(x,y)`. \vfill - Using a dataset containing information on housing sales in King County, WA [http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv](http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv), compute the following quantities: ```{r} Seattle <- read.csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv') ``` \vfill - mean sales price \vfill - standard deviation of sales price \vfill - correlation between sales price and square footage (sqft_living) - Intepret the three quantities above. \vfill \newpage ### 3. Autocorrelation and the Correlogram - In addition to mean and variance, the serial correlation (or autocorrelation) is an important element in time series modeling. \vfill - Autocovariance is defined as: \vfill - A time series model is second-order stationary if \vfill - The autocorrelation function is defined as \vfill - Similar to variance calculations, the sample autocovariance and autocorrelation functions can be computed: - sample acvf: \vfill - sample acf: \vfill - Note these properties require a stationary process, hence trends and cyclical patterns need to be removed when considering correlated random noise. \vfill ##### Simulating Correlated Time Series Data - As was mentioned earlier in class, we can think of time series modeling as similar to mixed models. Specifically, there is a specific correlation structure defined for each type of model. \vfill \newpage - First construct a covariance matrix between all of the observations. ```{r} set.seed(09062018) time.pts <- 200 auto.corr <- 0.9 evolution.matrix <- diag(time.pts) for (column in 1:time.pts){ evolution.matrix[,column] <- auto.corr ^ abs((1:time.pts) - column) } library(knitr) # for kable kable(evolution.matrix[1:5,1:5],caption = "Covariance matrix for first 5 time points") ``` \vfill - Simulate a vector of correlated normal random variables. ```{r} library(mnormt) # for rmnorm Y <- as.ts(rmnorm(n=1, mean=0, varcov=evolution.matrix)) ``` \vfill - Create time series figure ```{r, fig.align='center',fig.width=6, fig.height=4} library(ggfortify) # for autoplot autoplot(Y) + ggtitle(expression(paste('Simulated Time Series where ', rho[1], '= 0.9'))) ``` \newpage ##### Exercises: 1. Is the simulated time series stationary in the mean, why or why not? \vfill 2. What is $\gamma_k$ + \vfill 3. What is $\rho_k$ = \vfill 4. Change the `auto.corr` variable, rerun the simulation and describe how your figures are different. a. `auto.corr = 0`\vfill b. `auto.corr = .5`\vfill c. `auto.corr = -.9`\vfill 5. (536) Adapt the code to include a trend and seasonal cycle in addition to the serial correlated random innovations. \vfill \newpage - A useful tool for identifying autocorrelation structure in a time series dataset is the correlogram. The command for this in R is `acf()`. ```{r, fig.align='center',fig.height=4, fig.width=6} acf(Y) ``` \vfill - Correlograms have the following properties: - The \vfill - The lag 0 autocorrelation is always 1 and is include for comparison purposes. \vfill - If $\rho_k$ = 0, then the sampling distribution of $r_k$ is (approximately) normal with mean -1/n and variance of 1/n. \vfill - It is important to have a stationary time series that does not include deterministic signals, such as a trend or cycle. \vfill ##### Airline Passenger Example Section 2.3.2 Load Data and Decompose Time Series ```{r} data("AirPassengers") AP.decomp <- decompose(AirPassengers, 'additive') str(AP.decomp) ``` \newpage ACF Plot with Air Passengers Data (Covariance) ```{r, fig.align='center',fig.height=4, fig.width=6} par(mfcol=(c(1,2))) acf(AirPassengers, type = 'covariance'); acf(AirPassengers) ``` \vfill ACF Plot on Decomposed Random Component (Covariance) ```{r, fig.align='center',fig.height=4, fig.width=6} #exclude NA's random.AP <- AP.decomp$random[!is.na(AP.decomp$random)] par(mfcol=(c(1,2))) acf(random.AP, type='covariance'); acf(random.AP) ```