Dynamic Harmonic Regression for Time Series Data
계절 주기가 길 때
, 계절성 ARIMA보다 선호되는 모델이 Dynamic Harmonic Regression (DHR)
이다.다중 계절성
을 가진 시계열 데이터에도 DHR은 유용하다.계절성을 설명하는 퓨리에 항(Fourier Terms)
을 가진 회귀 모형이다.\[ \begin{aligned} Y_t = \beta_{0} + \sum^{K}_{i=1} \left[\alpha_{i,t} sin(2πit/m ) + \gamma_{i,t} cos(2πit/m ) \right] + \eta_t, \end{aligned} \]
pacman::p_load("forecast", "dplyr", "ggplot2", "xts")
# In Mac
# guess_encoding("Amtrak.csv")
# Amtrak.data <- read.csv("Amtrak.csv", fileEncoding="EUC-KR")
Amtrak.data <- data.table::fread(paste(getwd(),"Amtrak.csv", sep="/"))
ridership.ts <- ts(Amtrak.data$Ridership, start=c(1991,1), end=c(2004,3), freq=12)
forecast
package에 있는 fourier
함수를 이용한다.fourierf(x, K, h)
DHR.fit <- auto.arima(train.ts,
xreg = fourier(train.ts, K=2), # K : sine, cosine 쌍의 개수/시계열 데이터의 계절 주기가 2개 이상일 때, K는 계절 주기 수만큼 필요NA= FALSE)
DHR.fit
Series: train.ts
Regression with ARIMA(5,1,2) errors
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2
-1.2069 -1.1238 -0.7371 -0.1625 0.3954 0.5535 0.3476
s.e. 0.1406 0.2021 0.2245 0.1960 0.1114 0.1599 0.0981
drift S1-12 C1-12 S2-12 C2-12
0.4116 -52.3107 -107.3608 -35.2247 23.7494
s.e. 3.3551 8.9147 8.8901 5.8742 5.8703
sigma^2 estimated as 6143: log likelihood=-701.65
AIC=1429.29 AICc=1432.66 BIC=1465.75
\[ \begin{aligned} y_t &= 0.4116 -52.3107 sin(2\pi t/12 ) -107.3608 cos(2\pi t/12 ) -35.2247 sin(4\pi t/12) +23.7494 cos(4\pi t/12) + \eta_t, \\ \eta_t &= -1.2069\eta_{t-1} -1.1238\eta_{t-2}-0.7371\eta_{t-3}-0.1625\eta_{t-4} + 0.3954\eta_{t-5} + \epsilon_{t} +0.5535\epsilon_{t-1} + 0.3476\epsilon_{t-2},\\ \epsilon_{t} &\sim WN(0, 6143) \end{aligned} \]
checkresiduals(DHR.fit)
Ljung-Box test
data: Residuals from Regression with ARIMA(5,1,2) errors
Q* = 42.733, df = 12, p-value = 2.506e-05
Model df: 12. Total lags used: 24
DHR.forecast <- forecast(DHR.fit, xreg = fourier(train.ts, K=2, h=n.test) )
DHR.forecast$mean
Jan Feb Mar Apr May Jun Jul
2001 1990.590 2002.360 1939.015 2129.012
2002 1715.501 1701.942 1967.000 1997.213 2008.163 1980.212 2108.721
2003 1741.872 1731.747 1940.771 1999.742 2017.334 2010.164 2095.394
2004 1762.418 1754.637 1924.085
Aug Sep Oct Nov Dec
2001 2098.128 1834.731 1910.820 1949.104 1955.958
2002 2073.729 1877.231 1916.445 1951.071 1930.820
2003 2057.557 1908.100 1924.888 1951.064 1914.886
2004
plot(DHR.forecast)
accuracy(test.ts, DHR.forecast$mean)
ME RMSE MAE MPE MAPE ACF1
Test set -55.08739 97.74112 79.85223 -2.835221 4.127865 0.3131728
Theil's U
Test set 0.8378493
\[ \begin{aligned} Y_t = \beta_{0} + \sum_{i=1}^{j} \beta_{j}x_{i,t} + \sum^{K}_{i=1} \left[\alpha_{i,t} sin(2πit/m ) + \gamma_{i,t} cos(2πit/m ) \right] + \eta_t, \end{aligned} \]
위의 예제에 예측 변수가 있는 DHR을 적용해보았다.
## Month 변수 생성생성
xts(ridership.ts, order = as.Date(ridership.ts))
[,1]
1991-01-01 1708.917
1991-02-01 1620.586
1991-03-01 1972.715
1991-04-01 1811.665
1991-05-01 1974.964
1991-06-01 1862.356
1991-07-01 1939.860
1991-08-01 2013.264
1991-09-01 1595.657
1991-10-01 1724.924
1991-11-01 1675.667
1991-12-01 1813.863
1992-01-01 1614.827
1992-02-01 1557.088
1992-03-01 1891.223
1992-04-01 1955.981
1992-05-01 1884.714
1992-06-01 1623.042
1992-07-01 1903.309
1992-08-01 1996.712
1992-09-01 1703.897
1992-10-01 1810.000
1992-11-01 1861.601
1992-12-01 1875.122
1993-01-01 1705.259
1993-02-01 1618.535
1993-03-01 1836.709
1993-04-01 1957.043
1993-05-01 1917.185
1993-06-01 1882.398
1993-07-01 1933.009
1993-08-01 1996.167
1993-09-01 1672.841
1993-10-01 1752.827
1993-11-01 1720.377
1993-12-01 1734.292
1994-01-01 1563.365
1994-02-01 1573.959
1994-03-01 1902.639
1994-04-01 1833.888
1994-05-01 1831.049
1994-06-01 1775.755
1994-07-01 1867.508
1994-08-01 1906.608
1994-09-01 1685.632
1994-10-01 1778.546
1994-11-01 1775.995
1994-12-01 1783.350
1995-01-01 1548.415
1995-02-01 1496.925
1995-03-01 1798.316
1995-04-01 1732.895
1995-05-01 1772.345
1995-06-01 1761.207
1995-07-01 1791.655
1995-08-01 1874.820
1995-09-01 1571.309
1995-10-01 1646.948
1995-11-01 1672.631
1995-12-01 1656.845
1996-01-01 1381.758
1996-02-01 1360.852
1996-03-01 1558.575
1996-04-01 1608.420
1996-05-01 1696.696
1996-06-01 1693.183
1996-07-01 1835.516
1996-08-01 1942.573
1996-09-01 1551.401
1996-10-01 1686.508
1996-11-01 1576.204
1996-12-01 1700.433
1997-01-01 1396.588
1997-02-01 1371.690
1997-03-01 1707.522
1997-04-01 1654.604
1997-05-01 1762.903
1997-06-01 1775.800
1997-07-01 1934.219
1997-08-01 2008.055
1997-09-01 1615.924
1997-10-01 1773.910
1997-11-01 1732.368
1997-12-01 1796.626
1998-01-01 1570.330
1998-02-01 1412.691
1998-03-01 1754.641
1998-04-01 1824.932
1998-05-01 1843.289
1998-06-01 1825.964
1998-07-01 1968.172
1998-08-01 1921.645
1998-09-01 1669.597
1998-10-01 1791.474
1998-11-01 1816.714
1998-12-01 1846.754
1999-01-01 1599.427
1999-02-01 1548.804
1999-03-01 1832.333
1999-04-01 1839.720
1999-05-01 1846.498
1999-06-01 1864.852
1999-07-01 1965.743
1999-08-01 1949.002
1999-09-01 1607.373
1999-10-01 1803.664
1999-11-01 1850.309
1999-12-01 1836.435
2000-01-01 1541.660
2000-02-01 1616.928
2000-03-01 1919.538
2000-04-01 1971.493
2000-05-01 1992.301
2000-06-01 2009.763
2000-07-01 2053.996
2000-08-01 2097.471
2000-09-01 1823.706
2000-10-01 1976.997
2000-11-01 1981.408
2000-12-01 2000.153
2001-01-01 1683.148
2001-02-01 1663.404
2001-03-01 2007.928
2001-04-01 2023.792
2001-05-01 2047.008
2001-06-01 2072.913
2001-07-01 2126.717
2001-08-01 2202.638
2001-09-01 1707.693
2001-10-01 1950.716
2001-11-01 1973.614
2001-12-01 1984.729
2002-01-01 1759.629
2002-02-01 1770.595
2002-03-01 2019.912
2002-04-01 2048.398
2002-05-01 2068.763
2002-06-01 1994.267
2002-07-01 2075.258
2002-08-01 2026.560
2002-09-01 1734.155
2002-10-01 1916.771
2002-11-01 1858.345
2002-12-01 1996.352
2003-01-01 1778.033
2003-02-01 1749.489
2003-03-01 2066.466
2003-04-01 2098.899
2003-05-01 2104.911
2003-06-01 2129.671
2003-07-01 2223.349
2003-08-01 2174.360
2003-09-01 1931.406
2003-10-01 2121.470
2003-11-01 2076.054
2003-12-01 2140.677
2004-01-01 1831.508
2004-02-01 1838.006
2004-03-01 2132.446
DHR.fit2 <- auto.arima(train.ts,
xreg = Train.Xreg,
seasonal = FALSE)
DHR.fit2
Series: train.ts
Regression with ARIMA(2,1,2) errors
Coefficients:
ar1 ar2 ma1 ma2 Month S1-12 C1-12
-1.1998 -0.4967 0.2032 -0.4963 43.3345 109.4259 -150.3196
s.e. 0.1349 0.1294 0.1091 0.1353 9.2838 35.5583 12.9651
S2-12 C2-12
40.0290 -18.4092
s.e. 18.2969 12.3742
sigma^2 estimated as 9891: log likelihood=-730.38
AIC=1480.76 AICc=1482.74 BIC=1508.8
\[ \begin{aligned} y_t &= 43.3345x_{Month, t} +109.4259 sin(2\pi t/12 ) -150.3196 cos(2\pi t/12 ) + 40.0290 sin(4\pi t/12) -18.4092 cos(4\pi t/12) + \eta_t, \\ \eta_t &= -1.1998\eta_{t-1} + 0.4967\eta_{t-2} + \epsilon_{t} + 0.2032\epsilon_{t-1} - 0.4963\epsilon_{t-2},\\ \epsilon_{t} &\sim WN(0, 9891) \end{aligned} \]
checkresiduals(DHR.fit2)
Ljung-Box test
data: Residuals from Regression with ARIMA(2,1,2) errors
Q* = 95.067, df = 15, p-value = 1.115e-13
Model df: 9. Total lags used: 24
DHR.forecast2 <- forecast(DHR.fit2, xreg = Test.Xreg )
DHR.forecast2$mean
Jan Feb Mar Apr May Jun Jul
2001 1957.800 1980.577 2090.017 2020.626
2002 1651.827 1801.555 1912.521 1972.079 2010.833 2046.623 2057.661
2003 1647.319 1804.050 1911.767 1971.744 2011.609 2045.859 2058.193
2004 1647.258 1804.075 1911.768
Aug Sep Oct Nov Dec
2001 2048.011 1943.821 1891.313 1907.054 1999.249
2002 2025.131 1952.878 1891.812 1901.957 2005.117
2003 2024.872 1952.924 1891.885 1901.846 2005.213
2004
plot(DHR.forecast2)
accuracy(DHR.forecast2$mean, test.ts)
ME RMSE MAE MPE MAPE ACF1
Test set 58.24237 118.5707 97.03389 2.697112 4.885403 0.02801658
Theil's U
Test set 0.6815818
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".