Time Series Forecasting Practice

Today I explore time series forecasting use AFL data from the fitzRoy package.

Key concepts

A forecast is the mean of simulated futures.

  • Making a ts object
  • Trend, Seasonality, Cyclicity
  • Plotting lags and ACF
  • White noise
  • Creating forecast objects
  • Checking residuals
  • Mean Absolute Scaled Error
  • Cross Validation
  • Exponential Smoothing

Things to do

# TODO - figure out why full ts doesn't show info from the 1900s
# Packages
library(tidyverse) # For everything
## Warning: package 'dplyr' was built under R version 3.5.1
library(here) # For project-oriented workflow
library(ggthemes) # For nice plot themes
library(forecast) # For nice plot themes
library(devtools) # For non-CRAN packages
library(fitzRoy) # Data source
library(padr) # Dates padding

Create a time series

# First load results info  
d <- get_match_results()

Create a df with variables per time (mean where more than one game played on one day).

# Then prepare a subset for ts analysis 
d_prep_ts <- d %>%
  group_by(Date) %>%
  summarize(Margin = abs(mean(Margin, na.rm = TRUE)),
            Total_Points = mean((Home.Points + Away.Points), na.rm = TRUE),
            Conversion = mean((Home.Goals + Away.Goals) / (Home.Goals + Away.Goals + Home.Behinds + Away.Behinds)))

The padr package looks very helpful to fill in missing days.

# Fill timeseries with missing dates
d_prep_ts_full <- d_prep_ts %>% pad
## pad applied on the interval: day
# TODO - how to add missing weeks for full year coverage?
# Create ts object
d_ts <- ts(d_prep_ts_full[, 2:4], start = c(1897-05-08), frequency = 365.25) # Arbitrarily screwing with the frequency here

Having issues visualizing the data pre 1960? I suspect it’s something to do with games only being played once a week.

# TODO - why is data before 1960 not showing?
autoplot(d_ts, facets = TRUE)

The points are visible in a standard ggplot…

ggplot(data = d_prep_ts_full, aes(x = Date, y = Margin)) +
  geom_point()
## Warning: Removed 39549 rows containing missing values (geom_point).

ggplot(data = d_prep_ts_full, aes(x = Date, y = Total_Points)) +
  geom_point()
## Warning: Removed 39549 rows containing missing values (geom_point).

ggplot(data = d_prep_ts_full, aes(x = Date, y = Conversion)) +
  geom_point()
## Warning: Removed 39549 rows containing missing values (geom_point).

Subset 1900 - 1950.

nineteen_hundreds <- window(d_ts, 1900, 1950)

This makes no sense…

# TODO - why is data before 1960 not showing?
autoplot(nineteen_hundreds, facets = TRUE)

Subset since 1980, this seems to work.

my_life_afl <- window(d_ts, 1980)
autoplot(my_life_afl, facets = TRUE)

1980 only

birth_year <- window(d_ts, 1980, 1981)

Lag plots

Create yearly timeseries.

d_yearly <- d %>%
  filter(Date > "1980-01-01" & Date < "1989-12-31")
d_yearly <- d_yearly %>%
  separate(Date, c("Year", "Month", "Day"), sep = "-")
d_yearly <- d_yearly %>%
  group_by(Year) %>%
  summarize(Margin = abs(mean(Margin, na.rm = TRUE)),
            Total_Points = mean((Home.Points + Away.Points), na.rm = TRUE),
            Conversion = mean((Home.Goals + Away.Goals) / (Home.Goals + Away.Goals + Home.Behinds + Away.Behinds)))
d_yearly_ts <- ts(d_yearly[, 2:4], start = c(1980, 1), frequency = 1)
# TODO - save image and call directly
gglagplot(d_yearly_ts)
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

# Select only conversion
d_yearly_conversion <- d_yearly %>%
  select(Year, Conversion)

d_yearly_conversion_ts <- ts(d_yearly_conversion[, 2], start = c(1980, 1), frequency = 1)
ggAcf(d_yearly_conversion_ts)

# Select only Margin
d_yearly_margin <- d_yearly %>%
  select(Year, Margin)

d_yearly_margin_ts <- ts(d_yearly_margin[, 2], start = c(1980, 1), frequency = 1)
ggAcf(d_yearly_margin_ts)

White Noise

Use Ljung-Box test to see if there are any significant correlations, then look more closely at individual variable ACF

# A random time-series
Box.test(d_yearly_conversion_ts, lag = 9, type = "Ljung")
## 
##  Box-Ljung test
## 
## data:  d_yearly_conversion_ts
## X-squared = 11.557, df = 9, p-value = 0.2395
# Large p value confirms visual inspection

Naive Forecast

Previous value predicts future. A seasonal naive uses info from previous season.

fcd_yearly_margin_ts <- naive(d_yearly_margin_ts, 5)
autoplot(fcd_yearly_margin_ts)

summary(fcd_yearly_margin_ts)
## 
## Forecast method: Naive method
## 
## Model Information:
## Call: naive(y = d_yearly_margin_ts, h = 5) 
## 
## Residual sd: 3.5897 
## 
## Error measures:
##                   ME     RMSE      MAE      MPE     MAPE MASE        ACF1
## Training set 1.17032 3.589699 3.145682 8.064365 31.63073    1 -0.09954788
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80      Lo 95    Hi 95
## 1990       13.29375 8.693365 17.89413  6.2580684 20.32943
## 1991       13.29375 6.787823 19.79968  3.3437936 23.24371
## 1992       13.29375 5.325650 21.26185  1.1075920 25.47991
## 1993       13.29375 4.092980 22.49452 -0.7776132 27.36511
## 1994       13.29375 3.006977 23.58052 -2.4385124 29.02601
fcd_yearly_conversion_ts <- naive(d_yearly_conversion_ts, 5)
autoplot(fcd_yearly_conversion_ts)

summary(fcd_yearly_conversion_ts)
## 
## Forecast method: Naive method
## 
## Model Information:
## Call: naive(y = d_yearly_conversion_ts, h = 5) 
## 
## Residual sd: 0.0145 
## 
## Error measures:
##                         ME       RMSE        MAE        MPE     MAPE MASE
## Training set -0.0007522101 0.01452133 0.01217279 -0.1894464 2.377682    1
##                    ACF1
## Training set -0.3481954
## 
## Forecasts:
##      Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 1990      0.5013331 0.4827220 0.5199443 0.4728699 0.5297964
## 1991      0.5013331 0.4750131 0.5276532 0.4610801 0.5415862
## 1992      0.5013331 0.4690978 0.5335685 0.4520334 0.5506329
## 1993      0.5013331 0.4641109 0.5385554 0.4444067 0.5582596
## 1994      0.5013331 0.4597175 0.5429488 0.4376874 0.5649789

Residuals

checkresiduals(fcd_yearly_conversion_ts)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 1.561, df = 3, p-value = 0.6683
## 
## Model df: 0.   Total lags used: 3

Training and test

I’m going to try to forecast average total points per game based on data from 1897 until now. This is a purely theoretical interest.

d_points <- d %>%
  filter(Date > "1897-01-01" & Date < "2018-07-01")

d_points <- d_points %>%
  separate(Date, c("Year", "Month", "Day"), sep = "-")
d_points <- d_points %>%
  group_by(Year) %>%
  summarize(Mean_Total_Points = mean((Home.Points + Away.Points), na.rm = TRUE))
# Make ts object
d_points_ts <- ts(d_points[, 2], start = c(1897,1), frequency = 1)

str(d_points_ts)
##  Time-Series [1:122, 1] from 1897 to 2018: 77.9 78.4 75.9 81.7 88.5 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "Mean_Total_Points"

First a look at the data.

autoplot(d_points_ts)

summary(d_points_ts)
##  Mean_Total_Points
##  Min.   : 75.91   
##  1st Qu.:140.71   
##  Median :170.99   
##  Mean   :162.24   
##  3rd Qu.:188.40   
##  Max.   :224.13

White noise test.

Box.test(d_points_ts, lag = 100, type = "Ljung") # low p value
## 
##  Box-Ljung test
## 
## data:  d_points_ts
## X-squared = 2447.2, df = 100, p-value < 2.2e-16

Now split into test and train

train <- window(d_points_ts, end = 1995) #Use approx 80% of years as training
test <- window(d_points_ts, start = 1996) 

Create forecast on the train

# Naive forecast
naive_fc <- naive(train, h = 22)
# Forecasts the mean! A good comparison point
mean_fc <- meanf(train, h = 22)

Check residuals

checkresiduals(naive_fc) # Looks like there is a pattern in residuals, and not normally distrubuted

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 16.792, df = 10, p-value = 0.07909
## 
## Model df: 0.   Total lags used: 10
checkresiduals(mean_fc) # A different pattern

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 600.33, df = 9, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 10

Plot with actual test results overlayed

autoplot(naive_fc) +
  autolayer(test, series = "Test data")

autoplot(mean_fc) +
  autolayer(test, series = "Test data")

Compare accuracy

accuracy(naive_fc, test)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  1.128567 8.704984 6.820716  0.7496927 4.369219 1.0000000
## Test set     -2.574893 7.892782 5.978824 -1.5449914 3.247869 0.8765684
##                    ACF1 Theil's U
## Training set -0.2580396        NA
## Test set      0.5730870  1.234388
accuracy(mean_fc, test) # Error values all much bigger than the naive method
##                        ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 6.726294e-15 36.68483 30.39358 -6.926255 22.50356 4.456069
## Test set     2.900186e+01 29.94619 29.00186 15.463707 15.46371 4.252026
##                   ACF1 Theil's U
## Training set 0.9449538        NA
## Test set     0.5730870  4.448763

Apply to future

# Apply naive forecast to future

future <- naive(d_points_ts, h = 10)
autoplot(future)

summary(future) # Highs getter higher, lows get lower, further into future
## 
## Forecast method: Naive method
## 
## Model Information:
## Call: naive(y = d_points_ts, h = 10) 
## 
## Residual sd: 8.4205 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE MASE
## Training set 0.7198763 8.420486 6.568497 0.4835138 4.102777    1
##                    ACF1
## Training set -0.2343468
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2019       165.0244 154.2331 175.8157 148.5205 181.5282
## 2020       165.0244 149.7632 180.2856 141.6844 188.3644
## 2021       165.0244 146.3333 183.7155 136.4389 193.6099
## 2022       165.0244 143.4418 186.6070 132.0167 198.0321
## 2023       165.0244 140.8943 189.1544 128.1206 201.9281
## 2024       165.0244 138.5912 191.4575 124.5984 205.4504
## 2025       165.0244 136.4733 193.5755 121.3593 208.6895
## 2026       165.0244 134.5020 195.5468 118.3444 211.7043
## 2027       165.0244 132.6505 197.3983 115.5128 214.5360
## 2028       165.0244 130.8993 199.1494 112.8346 217.2142

Cross validation

Cross validation 8 times using tsCV. Useful for smaller smaples where train/test not as useful.

# TODO - troubleshoot this
# Make an empty matrix
#e <- matrix(NA_real_, nrow = 122, ncol = 8)
# Fill empty matrix with forecast based on tsCV function
#for (h in 1:8)
  #e[, h] <- tsCV(d_points_ts, naive, h = h)
#d_points_ts

Simple Exponential Smoothing

9 possible exponential smoothing methods and 2 error methods so 18 models

SES method on the points time series.

# SES forecast
ses_fc <- ses(train, h = 22)
summary(ses_fc)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = train, h = 22) 
## 
##   Smoothing parameters:
##     alpha = 0.7679 
## 
##   Initial states:
##     l = 77.9965 
## 
##   sigma:  8.5051
## 
##      AIC     AICc      BIC 
## 882.7488 883.0014 890.5341 
## 
## Error measures:
##                   ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 1.46999 8.418774 6.54524 0.9906061 4.206935 0.9596118
##                     ACF1
## Training set -0.04329736
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1996       189.7415 178.8418 200.6413 173.0718 206.4113
## 1997       189.7415 175.9992 203.4839 168.7245 210.7586
## 1998       189.7415 173.6512 205.8319 165.1335 214.3496
## 1999       189.7415 171.6047 207.8784 162.0036 217.4795
## 2000       189.7415 169.7667 209.7164 159.1927 220.2904
## 2001       189.7415 168.0842 211.3989 156.6195 222.8636
## 2002       189.7415 166.5233 212.9598 154.2323 225.2508
## 2003       189.7415 165.0609 214.4222 151.9957 227.4873
## 2004       189.7415 163.6804 215.8027 149.8845 229.5986
## 2005       189.7415 162.3695 217.1136 147.8796 231.6035
## 2006       189.7415 161.1185 218.3645 145.9664 233.5166
## 2007       189.7415 159.9200 219.5631 144.1335 235.3496
## 2008       189.7415 158.7678 220.7152 142.3713 237.1117
## 2009       189.7415 157.6570 221.8261 140.6725 238.8106
## 2010       189.7415 156.5834 222.8997 139.0305 240.4526
## 2011       189.7415 155.5434 223.9397 137.4400 242.0431
## 2012       189.7415 154.5342 224.9489 135.8965 243.5866
## 2013       189.7415 153.5531 225.9300 134.3960 245.0870
## 2014       189.7415 152.5979 226.8852 132.9352 246.5479
## 2015       189.7415 151.6666 227.8165 131.5110 247.9721
## 2016       189.7415 150.7576 228.7255 130.1208 249.3623
## 2017       189.7415 149.8693 229.6138 128.7622 250.7208
accuracy(ses_fc, test) # Not as good as naive in this case!
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  1.469990 8.418774 6.545240  0.9906061 4.206935 0.9596118
## Test set     -3.797513 8.371800 6.496814 -2.2035511 3.543547 0.9525119
##                     ACF1 Theil's U
## Training set -0.04329736        NA
## Test set      0.57308702   1.31419
checkresiduals(ses_fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 7.6096, df = 8, p-value = 0.4725
## 
## Model df: 2.   Total lags used: 10
# TODO - full accuracy assessment
autoplot(ses_fc) +
  autolayer(fitted(ses_fc))

Adding trends.

SES OK if there is no trend or seasonality.

Holt adds linear trends. Damped trend levels off.

fc_holt <- holt(train, h = 22)
summary(fc_holt)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = train, h = 22) 
## 
##   Smoothing parameters:
##     alpha = 0.7243 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 77.118 
##     b = 1.1788 
## 
##   sigma:  8.4529
## 
##      AIC     AICc      BIC 
## 883.4667 884.1119 896.4423 
## 
## Error measures:
##                       ME     RMSE     MAE         MPE     MAPE      MASE
## Training set -0.04834667 8.280375 6.31519 -0.03092285 4.048276 0.9258837
##                       ACF1
## Training set -0.0008172378
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1996       191.8177 180.9848 202.6505 175.2503 208.3850
## 1997       192.9960 179.6196 206.3724 172.5386 213.4534
## 1998       194.1744 178.6656 209.6831 170.4557 217.8930
## 1999       195.3527 177.9707 212.7347 168.7692 221.9362
## 2000       196.5310 177.4585 215.6036 167.3621 225.7000
## 2001       197.7094 177.0840 218.3348 166.1655 229.2533
## 2002       198.8877 176.8181 220.9574 165.1351 232.6404
## 2003       200.0661 176.6407 223.4915 164.2400 235.8922
## 2004       201.2444 176.5372 225.9517 163.4580 239.0309
## 2005       202.4228 176.4967 228.3489 162.7723 242.0733
## 2006       203.6011 176.5107 230.6916 162.1699 245.0324
## 2007       204.7795 176.5724 232.9866 161.6405 247.9185
## 2008       205.9578 176.6764 235.2393 161.1757 250.7400
## 2009       207.1362 176.8181 237.4543 160.7687 253.5037
## 2010       208.3145 176.9939 239.6352 160.4137 256.2154
## 2011       209.4929 177.2005 241.7853 160.1059 258.8799
## 2012       210.6712 177.4352 243.9073 159.8411 261.5013
## 2013       211.8496 177.6958 246.0034 159.6158 264.0833
## 2014       213.0279 177.9801 248.0757 159.4269 266.6289
## 2015       214.2063 178.2864 250.1261 159.2716 269.1409
## 2016       215.3846 178.6132 252.1560 159.1476 271.6216
## 2017       216.5630 178.9591 254.1669 159.0528 274.0732
accuracy(fc_holt, test)
##                        ME      RMSE      MAE          MPE      MAPE
## Training set  -0.04834667  8.280375  6.31519  -0.03092285  4.048276
## Test set     -18.24628731 22.644762 19.20784 -10.08581801 10.550093
##                   MASE          ACF1 Theil's U
## Training set 0.9258837 -0.0008172378        NA
## Test set     2.8161028  0.7906714861  3.586766
checkresiduals(fc_holt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 7.2797, df = 6, p-value = 0.2958
## 
## Model df: 4.   Total lags used: 10
autoplot(fc_holt) +
  autolayer(test, series = "Test Series")

Holt-Winters can account for trend and seasonality

State Space Models

Create a model then pass that to forecast which automatically picks best forecast method to minimize error.

The model state needs to be defined in terms of Error, Trend and Seasonality (ETS).

Great for getting quick results.

# Create ETS model (default)
fit_train <- ets(train)
checkresiduals(fit_train)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,N)
## Q* = 10.269, df = 6, p-value = 0.1138
## 
## Model df: 4.   Total lags used: 10
autoplot(forecast(fit_train))

summary(forecast(fit_train))
## 
## Forecast method: ETS(M,A,N)
## 
## Model Information:
## ETS(M,A,N) 
## 
## Call:
##  ets(y = train) 
## 
##   Smoothing parameters:
##     alpha = 0.7577 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 75.7469 
##     b = 1.7866 
## 
##   sigma:  0.0527
## 
##      AIC     AICc      BIC 
## 873.8085 874.4536 886.7841 
## 
## Error measures:
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.8280051 8.325919 6.315682 -0.5523087 4.051449 0.9259558
##                     ACF1
## Training set -0.03413297
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1996       192.1878 179.2088 205.1667 172.3381 212.0374
## 1997       193.9661 177.5772 210.3550 168.9015 219.0308
## 1998       195.7445 176.4896 214.9993 166.2967 225.1923
## 1999       197.5228 175.7298 219.3159 164.1932 230.8524
## 2000       199.3012 175.1936 223.4088 162.4317 236.1706
## 2001       201.0796 174.8214 227.3378 160.9211 241.2380
## 2002       202.8579 174.5754 231.1405 159.6035 246.1123
## 2003       204.6363 174.4299 234.8427 158.4396 250.8329
## 2004       206.4146 174.3665 238.4628 157.4012 255.4281
## 2005       208.1930 174.3715 242.0146 156.4674 259.9186
accuracy(forecast(fit_train), test)
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  -0.8280051  8.325919  6.315682 -0.5523087 4.051449 0.9259558
## Test set     -10.0352441 12.834377 11.596626 -5.3900465 6.143944 1.7002065
##                     ACF1 Theil's U
## Training set -0.03413297        NA
## Test set      0.43889707  1.733963

Transformations

Lambda.

BoxCox.lambda(d_points_ts)
## [1] 0.0981573

For non-seasonal differentiation, case use diff() - can also take seasonal difference as well as lag-1.

autoplot(d_points_ts)

autoplot(diff(d_points_ts))

ggAcf(diff(d_points_ts))

ARIMA

AutoRegressive Integrated Moving Average models.

AR - multiple regression with lagged observations as predictors.

MA - multiple regression with lagged errors as predictors.

Can only work with stationary data (so diffrerence if not already stationary)

ARIMA(p, d, q)

p = number of ordinary AR (observation) lags d = number of lag-1 differences q = number of ordinary MA (error) lags

Can’t compare ARIMA AICc between ARIMA and ETS Us tsCV to do this.

# Create ARIMA model (NOTE can also set lambda)
fit_arima_train <- auto.arima(train)
checkresiduals(fit_arima_train)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 7.3163, df = 8, p-value = 0.5029
## 
## Model df: 2.   Total lags used: 10
summary(fit_arima_train)
## Series: train 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.2716  1.1531
## s.e.   0.0958  0.6148
## 
## sigma^2 estimated as 70.69:  log likelihood=-346.74
## AIC=699.48   AICc=699.74   BIC=707.24
## 
## Training set error measures:
##                       ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -0.00348672 8.279417 6.312479 0.002746314 4.043272 0.9254863
##                      ACF1
## Training set -0.005044699
# Create forecast from ARIMA model

fc_arima <- forecast(fit_arima_train, h = 22)
# Check accuracy
accuracy(fc_arima, test)
##                        ME      RMSE       MAE          MPE      MAPE
## Training set  -0.00348672  8.279417  6.312479  0.002746314  4.043272
## Test set     -17.89178649 22.274277 18.870653 -9.892736886 10.365373
##                   MASE         ACF1 Theil's U
## Training set 0.9254863 -0.005044699        NA
## Test set     2.7666673  0.788712465  3.528378
# Show performance against test
autoplot(fc_arima) +
  autolayer(test, series = "Test Series")

# Apply forecast to future 10 years with AUTO ARIMA
fit_arima_future <- auto.arima(d_points_ts)

fc_arima_future <- forecast(fit_arima_future, h = 10)

autoplot(fc_arima_future) +
  autolayer(test, series = "Test Series")

summary(fc_arima_future)
## 
## Forecast method: ARIMA(0,1,1)
## 
## Model Information:
## Series: d_points_ts 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.2373
## s.e.   0.0868
## 
## sigma^2 estimated as 67.64:  log likelihood=-426.17
## AIC=856.35   AICc=856.45   BIC=861.94
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.9691201 8.156458 6.363843 0.6647838 3.982952 0.9688431
##                     ACF1
## Training set -0.02000658
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2019       168.0519 157.5123 178.5916 151.9329 184.1710
## 2020       168.0519 154.7963 181.3075 147.7793 188.3246
## 2021       168.0519 152.5491 183.5547 144.3425 191.7614
## 2022       168.0519 150.5887 185.5151 141.3443 194.7595
## 2023       168.0519 148.8272 187.2766 138.6503 197.4535
## 2024       168.0519 147.2141 188.8897 136.1832 199.9206
## 2025       168.0519 145.7172 190.3867 133.8939 202.2099
## 2026       168.0519 144.3145 191.7894 131.7486 204.3552
## 2027       168.0519 142.9902 193.1137 129.7233 206.3806
## 2028       168.0519 141.7324 194.3715 127.7997 208.3042
# Try tweaking the ARIMA (include drift)
d_points_ts %>% 
  Arima(order = c(0,1,1), include.constant = TRUE) %>% 
  forecast(h = 10) %>%
  autoplot()

# TODO - create a seasonal ts and try an ARIMA model on it

Dynamic Regression

Can add matrix of predictor variable to auto.arima model.

Might be interesting to look at forecasting total points informed by kicks.

Dynamic harmonic regression incorporates seasonality with Fourier terms.

TBATS

Trigonometric terms for seasonality

Box-Cox

ARMA error

Trend

Seasonal

Useful but dangerous!!