Problem 2.15 Tsay’s (2010) book: The quarterly gross domestic product implicit price deflator is often used as a measure of inflation. The file q-gdpdef.txt contains the data for the United States from the first quarter of 1947 to the last quarter of 2008. Data format is year, month, day, and deflator. The data are seasonally adjusted and equal to 100 for year 2000. Build an ARIMA model for the series and check the validity of the fitted model. Use the fitted model to predict the inflation for each quarter of 2009. The data are obtained from the Federal Reserve Bank of St Louis. (obs: However, download the up-to-date quarterly gross domestic implicit price deflator time series from the Federal Reserve Bank of St Louis. Fit the ARIMA models with data up to the 4th quarter of 2018 and use 2019.I to 2020.III (7 quarters) for forecasting comparisons).
data_original <- read.csv("/home/thaline/Documentos/Doutorado/Disciplinas/Econometria 3/data_gross_domestic_price.csv")
data <- ts(data_original[,2], frequency = 4, start = c(1947, 2), end = c(2020, 3))
data_fit <- ts(data_original[1:287,2], frequency = 4, start = c(1947, 2), end = c(2018, 4)) #data for fitting the model
plot(data)
The graph of the quarterly gross domestic product implicit price deflator is plotted in the figure above The series does not appear to be a stable process. The ACF and PACF of the series can help us to identifying the model.
acf(data_fit)
pacf(data_fit)
Note that the ACF does not decay exponentially and we observe many autocorrelations values above the confidence interval suggesting that the data is not stationary.
Let’s apply the Augmented Dickey-Fuller (ADF) test to verify if the series has a unit root.
library("tseries")
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
adf.test (data_fit)
##
## Augmented Dickey-Fuller Test
##
## data: data_fit
## Dickey-Fuller = -3.5832, Lag order = 6, p-value = 0.03513
## alternative hypothesis: stationary
adf.test (data_fit, k=7)
##
## Augmented Dickey-Fuller Test
##
## data: data_fit
## Dickey-Fuller = -2.8255, Lag order = 7, p-value = 0.2283
## alternative hypothesis: stationary
Note that the test is very sensitive to the choice of lags. We prefer to be cautious and accept the null hypothesis that the series has a unit root.
Let’s evaluate the data in their differences.
data_gr = diff(data_fit)
plot(data_gr)
adf.test(data_gr)
## Warning in adf.test(data_gr): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data_gr
## Dickey-Fuller = -9.8494, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
adf.test(data_gr, k=10)
## Warning in adf.test(data_gr, k = 10): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data_gr
## Dickey-Fuller = -7.6057, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
The graph of transformed data appears to be a stationary process. The ADF test rejects the null hypothesis of unit root and we can accept the alternative hypothesis of stationary.
acf(data_gr)
pacf(data_gr)
Inspecting the sample ACF and PACF, it seems that the ACF is cutting off at lag 1 and the PACF is tailing off. The sample ACF and PACF also suggest that the PACF it cutting off lag 2 and the ACF is tailing off. Therefore, we can suggest two possible models for the transforming data: MA(1) or AR(2). (ARIMA(0,1,1) or ARIMA(2,1,0) for the original data).
Let’s use the sarima function to fit the model.
library("astsa")
#ARIMA(0,1,1)
sarima1 <- sarima(data_gr, p=0, d=0, q=1)
## initial value 0.558851
## iter 2 value 0.494966
## iter 3 value 0.491277
## iter 4 value 0.490952
## iter 5 value 0.490951
## iter 5 value 0.490951
## iter 5 value 0.490951
## final value 0.490951
## converged
## initial value 0.491023
## iter 2 value 0.491023
## iter 2 value 0.491023
## iter 2 value 0.491023
## final value 0.491023
## converged
sarima1
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 xmean
## -0.4034 -0.0152
## s.e. 0.0590 0.0578
##
## sigma^2 estimated as 2.668: log likelihood = -546.25, aic = 1098.5
##
## $degrees_of_freedom
## [1] 284
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.4034 0.0590 -6.8318 0.0000
## xmean -0.0152 0.0578 -0.2631 0.7926
##
## $AIC
## [1] 3.840902
##
## $AICc
## [1] 3.84105
##
## $BIC
## [1] 3.879251
#ARIMA(2,1,0)
sarima2 <- sarima(data_gr, p=2, d=0, q=0)
## initial value 0.555324
## iter 2 value 0.489652
## iter 3 value 0.482655
## iter 4 value 0.482193
## iter 5 value 0.482193
## iter 5 value 0.482193
## iter 5 value 0.482193
## final value 0.482193
## converged
## initial value 0.488395
## iter 2 value 0.488353
## iter 3 value 0.488351
## iter 4 value 0.488351
## iter 4 value 0.488351
## iter 4 value 0.488351
## final value 0.488351
## converged
sarima2
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 xmean
## -0.3740 -0.1959 -0.0161
## s.e. 0.0581 0.0584 0.0615
##
## sigma^2 estimated as 2.654: log likelihood = -545.48, aic = 1098.97
##
## $degrees_of_freedom
## [1] 283
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3740 0.0581 -6.4361 0.0000
## ar2 -0.1959 0.0584 -3.3515 0.0009
## xmean -0.0161 0.0615 -0.2616 0.7938
##
## $AIC
## [1] 3.842552
##
## $AICc
## [1] 3.84285
##
## $BIC
## [1] 3.893685
Note that the AIC, AICc and BIC suggests that the MA(1) model for the transformed data fits better than the AR(2) model.
“If the model fits well, the standardized residuals should behave as an iid sequence with mean zero and variance one” (Shumway and Stoffer (2001), page 140). The model MA(1) seems to fit well since the ACF of the residuals shows no correlation (if we do not consider the outliers).
We can also use the function auto.arima to fit a ARIMA model.
library("forecast")
##
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
##
## gas
arima3 = auto.arima(data_fit, seasonal = FALSE, approximation = FALSE, stepwise = FALSE)
arima3
## Series: data_fit
## ARIMA(1,1,4)
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4
## 0.6241 -1.0685 0.2008 0.1084 -0.1563
## s.e. 0.1207 0.1252 0.0983 0.0810 0.0607
##
## sigma^2 estimated as 2.547: log likelihood=-537.31
## AIC=1086.62 AICc=1086.93 BIC=1108.56
acf(arima3$residuals)
Note that the information criterion (AIC, AICC, BIC) suggests that the model ARIMA(0,1,1) is a superior model than ARIMA(1,1,4). Let’s use the two models to make the predictions and compare them.
Forecast ARIMA (0,1,1)
#ARIMA(0,1,1)
sarima.for(data_fit, n.ahead = 7, p=0, d=1, q=1)
## $pred
## Qtr1 Qtr2 Qtr3 Qtr4
## 2019 1.988731 1.973530 1.958329 1.943128
## 2020 1.927927 1.912726 1.897525
##
## $se
## Qtr1 Qtr2 Qtr3 Qtr4
## 2019 1.633479 1.902127 2.137269 2.348989
## 2020 2.543144 2.723493 2.892619
lines(data, lty = 2, col=1)
Forecast ARIMA (1,1,4)
#ARIMA(1,1,4)
forecast(arima3, h=7)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2019 Q1 2.005479 -0.03977385 4.050731 -1.122465 5.133423
## 2019 Q2 1.713667 -0.62609654 4.053431 -1.864693 5.292027
## 2019 Q3 1.735408 -0.80126826 4.272084 -2.144103 5.614919
## 2019 Q4 1.764136 -1.00232381 4.530596 -2.466799 5.995072
## 2020 Q1 1.782067 -1.11544061 4.679574 -2.649288 6.213422
## 2020 Q2 1.793258 -1.19003600 4.776552 -2.769296 6.355812
## 2020 Q3 1.800243 -1.24595787 4.846443 -2.858519 6.459004
plot(forecast(arima3, h=7))
lines(data[], col=1, lty =2)
The forecast of the two models is very similar. Let’s calculate the forecast mean square error (MSE) of the models and compare them.
#Real and Forecast values
real = c(1.0, 2.6, 1.4, 1.5, 1.7, -2.1, 3.7)
sarima1_for = c(1.988731, 1.973530, 1.958329, 1.943128, 1.927927, 1.912726, 1.897525)
arima3_for = c(2.005479, 1.713667, 1.735408, 1.764136, 1.782067, 1.793258, 1.800243)
#MSE
mse_sarima1 = mean((real-sarima1_for)^2)
mse_arima3 = mean((real-arima3_for)^2)
mse_sarima1
## [1] 3.040141
mse_arima3
## [1] 2.964587
Considering the MSE, the model AIRMA(1,1,4) had a better prediction of the series than the model ARIMA(0,1,1) although they are very close to each other.