--- title: "time_series" author: "Kathleen Durant" date: "October 8, 2018" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## R Markdown ```{r loadlib} library(tidyverse) # data manipulation and visualization library(lubridate) # easily work with dates and times library(fpp2) # working with time series data library(zoo) # working with time series data ``` Moving Average Choose a number of nearby points and average them to estimate the trend. When calculating a simple moving average, it is beneficial to use an odd number of points so that the calculation is symmetric. Check out the description of the rollmean function ```{r simple} savings <- economics %>% select(date, srate = psavert) %>% mutate(srate_ma01 = rollmean(srate, k = 13, fill = NA), srate_ma02 = rollmean(srate, k = 25, fill = NA), srate_ma03 = rollmean(srate, k = 37, fill = NA), srate_ma05 = rollmean(srate, k = 61, fill = NA), srate_ma10 = rollmean(srate, k = 121, fill = NA)) ``` ```{r plot_data} savings %>% gather(metric, value, srate:srate_ma10) %>% ggplot(aes(date, value, color = metric)) + geom_line() ``` As the number of points used for the average increases, the curve becomes smoother and smoother. Choosing a value for k is a balance between eliminating noise while still capturing the data's true structure. ```{r} savings %>% gather(metric, value, srate_ma01:srate_ma10) %>% group_by(metric) %>% summarise(MSE = mean((srate - value)^2, na.rm = TRUE), MAPE = mean(abs((srate - value)/srate), na.rm = TRUE)) ``` As the number of points increases, so does the error. It is important to get a sense of the error associated with your chosen representation. What if you wanted to do a prediction? The average moving average cannot be used since it centers the month in the duration. You would need to use a trailing moving average since it uses historical periods for averaging the values For purposes of forecasting, we use trailing moving averages, where the window of k periods is placed over the most recent available k values of the series. Set the align = "right" to acquire historical data. ```{r} savings_tma <- economics %>% select(date, srate = psavert) %>% mutate(srate_tma = rollmean(srate, k = 12, fill = NA, align = "right")) ``` ```{r} savings_tma %>% gather(metric, value, -date) %>% ggplot(aes(date, value, color = metric)) + geom_line() ``` Moving Average, requires that the weights add to one ```{r} ma(AirPassengers, order = 12, centre = TRUE) autoplot(AirPassengers, series = "Data") + autolayer(ma(AirPassengers, order = 12, centre = T), series = "2x12-MA") + ggtitle("Monthly Airline Passengers (1949-60)") + labs(x = NULL, y = "Passengers") ```