Problem 1

Problem 2.15 Tsay’s (2010) book: The quarterly gross domestic product implicit price deflator is often used as a measure of inflation. The file q-gdpdef.txt contains the data for the United States from the first quarter of 1947 to the last quarter of 2008. Data format is year, month, day, and deflator. The data are seasonally adjusted and equal to 100 for year 2000. Build an ARIMA model for the series and check the validity of the fitted model. Use the fitted model to predict the inflation for each quarter of 2009. The data are obtained from the Federal Reserve Bank of St Louis. (obs: However, download the up-to-date quarterly gross domestic implicit price deflator time series from the Federal Reserve Bank of St Louis. Fit the ARIMA models with data up to the 4th quarter of 2018 and use 2019.I to 2020.III (7 quarters) for forecasting comparisons).

Step 1: “Import” the data and create time series data that will be used to fit the model

data_original <- read.csv("/home/thaline/Documentos/Doutorado/Disciplinas/Econometria 3/data_gross_domestic_price.csv")

data <- ts(data_original[,2], frequency = 4, start = c(1947, 2), end = c(2020, 3))

data_fit <- ts(data_original[1:287,2], frequency = 4, start = c(1947, 2), end = c(2018, 4)) #data for fitting the model

Step 2: Identifying the model

plot(data)

The graph of the quarterly gross domestic product implicit price deflator is plotted in the figure above The series does not appear to be a stable process. The ACF and PACF of the series can help us to identifying the model.

acf(data_fit)

pacf(data_fit)

Note that the ACF does not decay exponentially and we observe many autocorrelations values above the confidence interval suggesting that the data is not stationary.

Let’s apply the Augmented Dickey-Fuller (ADF) test to verify if the series has a unit root.

library("tseries")
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
adf.test (data_fit)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data_fit
## Dickey-Fuller = -3.5832, Lag order = 6, p-value = 0.03513
## alternative hypothesis: stationary
adf.test (data_fit, k=7)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data_fit
## Dickey-Fuller = -2.8255, Lag order = 7, p-value = 0.2283
## alternative hypothesis: stationary

Note that the test is very sensitive to the choice of lags. We prefer to be cautious and accept the null hypothesis that the series has a unit root.

Step 3: Transforming the data

Let’s evaluate the data in their differences.

data_gr = diff(data_fit)
plot(data_gr)

adf.test(data_gr)
## Warning in adf.test(data_gr): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data_gr
## Dickey-Fuller = -9.8494, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
adf.test(data_gr, k=10)
## Warning in adf.test(data_gr, k = 10): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data_gr
## Dickey-Fuller = -7.6057, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary

The graph of transformed data appears to be a stationary process. The ADF test rejects the null hypothesis of unit root and we can accept the alternative hypothesis of stationary.

Step 4: Identifying the model of transformed data

acf(data_gr)

pacf(data_gr)

Inspecting the sample ACF and PACF, it seems that the ACF is cutting off at lag 1 and the PACF is tailing off. The sample ACF and PACF also suggest that the PACF it cutting off lag 2 and the ACF is tailing off. Therefore, we can suggest two possible models for the transforming data: MA(1) or AR(2). (ARIMA(0,1,1) or ARIMA(2,1,0) for the original data).

Step 5: Fit the model

Let’s use the sarima function to fit the model.

library("astsa")

#ARIMA(0,1,1)
sarima1 <- sarima(data_gr, p=0, d=0, q=1)
## initial  value 0.558851 
## iter   2 value 0.494966
## iter   3 value 0.491277
## iter   4 value 0.490952
## iter   5 value 0.490951
## iter   5 value 0.490951
## iter   5 value 0.490951
## final  value 0.490951 
## converged
## initial  value 0.491023 
## iter   2 value 0.491023
## iter   2 value 0.491023
## iter   2 value 0.491023
## final  value 0.491023 
## converged

sarima1
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1    xmean
##       -0.4034  -0.0152
## s.e.   0.0590   0.0578
## 
## sigma^2 estimated as 2.668:  log likelihood = -546.25,  aic = 1098.5
## 
## $degrees_of_freedom
## [1] 284
## 
## $ttable
##       Estimate     SE t.value p.value
## ma1    -0.4034 0.0590 -6.8318  0.0000
## xmean  -0.0152 0.0578 -0.2631  0.7926
## 
## $AIC
## [1] 3.840902
## 
## $AICc
## [1] 3.84105
## 
## $BIC
## [1] 3.879251
#ARIMA(2,1,0)
sarima2 <- sarima(data_gr, p=2, d=0, q=0)
## initial  value 0.555324 
## iter   2 value 0.489652
## iter   3 value 0.482655
## iter   4 value 0.482193
## iter   5 value 0.482193
## iter   5 value 0.482193
## iter   5 value 0.482193
## final  value 0.482193 
## converged
## initial  value 0.488395 
## iter   2 value 0.488353
## iter   3 value 0.488351
## iter   4 value 0.488351
## iter   4 value 0.488351
## iter   4 value 0.488351
## final  value 0.488351 
## converged

sarima2
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2    xmean
##       -0.3740  -0.1959  -0.0161
## s.e.   0.0581   0.0584   0.0615
## 
## sigma^2 estimated as 2.654:  log likelihood = -545.48,  aic = 1098.97
## 
## $degrees_of_freedom
## [1] 283
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1    -0.3740 0.0581 -6.4361  0.0000
## ar2    -0.1959 0.0584 -3.3515  0.0009
## xmean  -0.0161 0.0615 -0.2616  0.7938
## 
## $AIC
## [1] 3.842552
## 
## $AICc
## [1] 3.84285
## 
## $BIC
## [1] 3.893685

Note that the AIC, AICc and BIC suggests that the MA(1) model for the transformed data fits better than the AR(2) model.

Step 6: Diagnostics

“If the model fits well, the standardized residuals should behave as an iid sequence with mean zero and variance one” (Shumway and Stoffer (2001), page 140). The model MA(1) seems to fit well since the ACF of the residuals shows no correlation (if we do not consider the outliers).

Alternative model

We can also use the function auto.arima to fit a ARIMA model.

library("forecast")
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:astsa':
## 
##     gas
arima3 = auto.arima(data_fit, seasonal = FALSE, approximation = FALSE, stepwise = FALSE)
arima3
## Series: data_fit 
## ARIMA(1,1,4) 
## 
## Coefficients:
##          ar1      ma1     ma2     ma3      ma4
##       0.6241  -1.0685  0.2008  0.1084  -0.1563
## s.e.  0.1207   0.1252  0.0983  0.0810   0.0607
## 
## sigma^2 estimated as 2.547:  log likelihood=-537.31
## AIC=1086.62   AICc=1086.93   BIC=1108.56
acf(arima3$residuals)

Note that the information criterion (AIC, AICC, BIC) suggests that the model ARIMA(0,1,1) is a superior model than ARIMA(1,1,4). Let’s use the two models to make the predictions and compare them.

Step 7: Forecasting

Forecast ARIMA (0,1,1)

#ARIMA(0,1,1)
sarima.for(data_fit, n.ahead = 7, p=0, d=1, q=1)
## $pred
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2019 1.988731 1.973530 1.958329 1.943128
## 2020 1.927927 1.912726 1.897525         
## 
## $se
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2019 1.633479 1.902127 2.137269 2.348989
## 2020 2.543144 2.723493 2.892619
lines(data, lty = 2, col=1)

Forecast ARIMA (1,1,4)

#ARIMA(1,1,4)
forecast(arima3, h=7)
##         Point Forecast       Lo 80    Hi 80     Lo 95    Hi 95
## 2019 Q1       2.005479 -0.03977385 4.050731 -1.122465 5.133423
## 2019 Q2       1.713667 -0.62609654 4.053431 -1.864693 5.292027
## 2019 Q3       1.735408 -0.80126826 4.272084 -2.144103 5.614919
## 2019 Q4       1.764136 -1.00232381 4.530596 -2.466799 5.995072
## 2020 Q1       1.782067 -1.11544061 4.679574 -2.649288 6.213422
## 2020 Q2       1.793258 -1.19003600 4.776552 -2.769296 6.355812
## 2020 Q3       1.800243 -1.24595787 4.846443 -2.858519 6.459004
plot(forecast(arima3, h=7))
lines(data[], col=1, lty =2)

Forecast comparison

The forecast of the two models is very similar. Let’s calculate the forecast mean square error (MSE) of the models and compare them.

#Real and Forecast values
real = c(1.0, 2.6, 1.4, 1.5, 1.7, -2.1, 3.7)
sarima1_for = c(1.988731, 1.973530, 1.958329, 1.943128, 1.927927, 1.912726, 1.897525)
arima3_for = c(2.005479, 1.713667, 1.735408, 1.764136, 1.782067, 1.793258, 1.800243)

#MSE
mse_sarima1 = mean((real-sarima1_for)^2)
mse_arima3 = mean((real-arima3_for)^2)

mse_sarima1
## [1] 3.040141
mse_arima3
## [1] 2.964587

Considering the MSE, the model AIRMA(1,1,4) had a better prediction of the series than the model ARIMA(0,1,1) although they are very close to each other.