Introduction
Loadings packages and reading the data
Ordinary least squares
- Marginally significant variables
Bayesian Gaussian linear regression
- Out-of-sample exercise
Your task

Introduction

The variable to predict is the monthly growth rate of US industrial production, and the dataset consists of 130 possible predictors, including various monthly macroeconomic indicators, such as measures of output, income, consumption, orders, surveys, labor market variables, house prices, consumer and producer prices, money, credit and asset prices. The sample ranges from February 1960 to December 2014, and all the data have been transformed to obtain stationarity, as in the work of Stock and Watson.

\(y\): the monthly growth rate of US industrial production.
\(X\): monthly macroeconomic indicators, such as measures of output, income, consumption, orders, surveys, labor market variables, house prices, consumer and producer prices, money, credit and asset prices.
Period: February 1960 to December 2014 (about 660 observations)
Full description of X: See appendix B of Stock and Watson (2002b), pages 157-161.
References:
- Stock and Watson (2002a) Forecasting Using Principal Components from a Large Number of Predictors, JASA, 97, 147–162. https://scholar.harvard.edu/files/stock/files/forecasting_using_principal_components_from_a_large_number_of_predictors.pdf
- Stock and Watson (2002b) Macroeconomic Forecasting Using Diffusion Indexes, JBES, 20, 147–162. https://scholar.harvard.edu/files/stock/files/macroeconomic_forecasting_using_diffusion_indexes.pdf
- Giannone, Lenza ad Primiceri (2020) Economic predictions with big data: the illusion of sparsity. https://faculty.wcas.northwestern.edu/~gep575/illusion4-2.pdf
- Fava and Lopes (2020) The illusion of the illusion of sparsty: an exercise in prior sensitivity. https://arxiv.org/abs/2009.14296

Loadings packages and reading the data

library("bayeslm")

filename = "https://hedibert.org/wp-content/uploads/2021/03/stockwatson2002-data.txt"

macrodata = read.table(filename,header=FALSE)

k = ncol(macrodata)-1

y = macrodata[,1]

X = as.matrix(macrodata[,2:(k+1)])

n = nrow(X)

dim(X)

## [1] 659 130

Ordinary least squares

fit.ols = lm(y~X-1)
summary(fit.ols)

## 
## Call:
## lm(formula = y ~ X - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9224 -0.3948  0.0186  0.4279  2.9102 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## XV2    0.0003296  0.0784303   0.004 0.996649    
## XV3   -0.0748243  0.0698419  -1.071 0.284506    
## XV4    0.0510148  0.0522478   0.976 0.329312    
## XV5    0.1303911  0.0621847   2.097 0.036482 *  
## XV6   -0.0752049  0.0556389  -1.352 0.177061    
## XV7    0.4439833  0.5729826   0.775 0.438767    
## XV8    0.0824967  0.4051686   0.204 0.838736    
## XV9   -0.1714310  0.3041942  -0.564 0.573294    
## XV10   0.7845590  0.4465061   1.757 0.079478 .  
## XV11  -0.5097401  0.2846165  -1.791 0.073869 .  
## XV12  -0.5692420  0.2576484  -2.209 0.027576 *  
## XV13   0.0590565  0.1192957   0.495 0.620775    
## XV14  -0.2261347  0.3378677  -0.669 0.503596    
## XV15   0.0672032  0.1357625   0.495 0.620802    
## XV16   0.0170097  0.0780816   0.218 0.827634    
## XV17   0.8933346  0.4306541   2.074 0.038528 *  
## XV18  -0.0016022  0.0555055  -0.029 0.976983    
## XV19   0.0300147  0.0354725   0.846 0.397857    
## XV20   0.0291581  0.1920536   0.152 0.879385    
## XV21  -1.1541357  0.3158563  -3.654 0.000284 ***
## XV22  -0.0463663  0.0638730  -0.726 0.468213    
## XV23  -0.0165637  0.0862872  -0.192 0.847847    
## XV24   0.0166249  0.2345343   0.071 0.943516    
## XV25  -0.0025336  0.2609214  -0.010 0.992256    
## XV26   0.0555815  0.1420061   0.391 0.695658    
## XV27   0.0787404  0.0512527   1.536 0.125058    
## XV28  -0.0792625  0.0669311  -1.184 0.236850    
## XV29  -0.0526238  0.0514123  -1.024 0.306508    
## XV30   0.0931659  0.2165381   0.430 0.667188    
## XV31  -0.1223388  0.1531938  -0.799 0.424888    
## XV32  -0.0737371  0.1359880  -0.542 0.587887    
## XV33  -0.0772511  0.0413481  -1.868 0.062271 .  
## XV34  -0.3257963  0.4611338  -0.707 0.480181    
## XV35   0.0197387  0.4427482   0.045 0.964457    
## XV36   0.1245017  0.0734396   1.695 0.090608 .  
## XV37   0.1122901  0.1753559   0.640 0.522219    
## XV38  -4.5120836  2.9797545  -1.514 0.130560    
## XV39   3.6768907  2.4856744   1.479 0.139673    
## XV40   1.4575165  0.8106561   1.798 0.072756 .  
## XV41   0.1604419  0.2963569   0.541 0.588473    
## XV42   0.0745660  0.1014828   0.735 0.462808    
## XV43   0.0597908  0.0600656   0.995 0.319985    
## XV44   0.0697782  0.0713253   0.978 0.328368    
## XV45  -0.0822017  0.0529358  -1.553 0.121056    
## XV46   0.0091953  0.0496071   0.185 0.853016    
## XV47   0.1914305  0.1919833   0.997 0.319162    
## XV48  -0.0213217  0.0390319  -0.546 0.585115    
## XV49  -0.4595203  0.2090294  -2.198 0.028356 *  
## XV50  -0.1541241  0.1467067  -1.051 0.293941    
## XV51   2.2246672  1.3427179   1.657 0.098145 .  
## XV52  -0.4533709  0.2265101  -2.002 0.045844 *  
## XV53  -0.6410294  0.3334704  -1.922 0.055105 .  
## XV54  -0.7578373  0.6043409  -1.254 0.210400    
## XV55  -0.7237723  0.4070520  -1.778 0.075965 .  
## XV56  -3.0493687  1.3518344  -2.256 0.024495 *  
## XV57   0.5961090  0.2435195   2.448 0.014693 *  
## XV58   0.6488050  0.3056621   2.123 0.034249 *  
## XV59   1.2935089  0.6247248   2.071 0.038888 *  
## XV60   1.0719533  0.4575892   2.343 0.019519 *  
## XV61   0.4360295  0.5511585   0.791 0.429232    
## XV62   0.0804641  0.2107074   0.382 0.702707    
## XV63   0.0514561  0.1520367   0.338 0.735162    
## XV64  -0.1490610  0.0934711  -1.595 0.111370    
## XV65  -0.0430592  0.0470526  -0.915 0.360540    
## XV66   0.1086395  0.0555799   1.955 0.051150 .  
## XV67  -0.0676248  0.0604207  -1.119 0.263549    
## XV68   0.1235487  0.0740774   1.668 0.095941 .  
## XV69   0.0098264  0.0431925   0.228 0.820121    
## XV70   0.0279949  0.0602745   0.464 0.642512    
## XV71   0.0495261  0.0567116   0.873 0.382897    
## XV72  -0.0046692  0.0440370  -0.106 0.915599    
## XV73   0.1189118  0.0473395   2.512 0.012305 *  
## XV74   0.0372065  0.0326995   1.138 0.255706    
## XV75   0.0200959  0.0355078   0.566 0.571662    
## XV76  -0.0690243  0.0346182  -1.994 0.046679 *  
## XV77   0.1229494  0.0477078   2.577 0.010232 *  
## XV78  -0.1592904  0.0585432  -2.721 0.006725 ** 
## XV79   0.1691765  0.2781863   0.608 0.543355    
## XV80  -0.0491224  0.2650672  -0.185 0.853049    
## XV81   0.1262854  0.0996144   1.268 0.205448    
## XV82  -0.0401429  0.0643365  -0.624 0.532928    
## XV83   0.0076629  0.0766363   0.100 0.920389    
## XV84   0.1658422  0.1104483   1.502 0.133813    
## XV85  -0.4816826  0.1649020  -2.921 0.003638 ** 
## XV86   0.4785546  0.2515630   1.902 0.057671 .  
## XV87  -0.0681240  0.2091598  -0.326 0.744778    
## XV88  -0.1753167  0.1764665  -0.993 0.320928    
## XV89   0.0604099  0.1559507   0.387 0.698641    
## XV90   0.2970371  0.1039912   2.856 0.004454 ** 
## XV91  -0.2511460  0.0847305  -2.964 0.003173 ** 
## XV92  -0.1265381  0.0884063  -1.431 0.152928    
## XV93   0.5029510  0.2072896   2.426 0.015586 *  
## XV94  -0.2978482  0.3421181  -0.871 0.384367    
## XV95   0.1573286  0.2422786   0.649 0.516381    
## XV96  -0.4917361  0.4441839  -1.107 0.268773    
## XV97   0.1713850  0.5031460   0.341 0.733520    
## XV98   0.6657157  0.4637054   1.436 0.151695    
## XV99  -0.3338755  0.3530717  -0.946 0.344769    
## XV100  0.0583770  0.0486044   1.201 0.230264    
## XV101 -0.0342445  0.0407658  -0.840 0.401271    
## XV102  0.0196541  0.0454268   0.433 0.665443    
## XV103 -0.0313258  0.0402908  -0.777 0.437215    
## XV104  0.0215974  0.1185297   0.182 0.855487    
## XV105  0.0309690  0.1193782   0.259 0.795413    
## XV106  0.0269397  0.0551817   0.488 0.625611    
## XV107 -0.0521872  0.0432823  -1.206 0.228456    
## XV108  0.0107983  0.0361107   0.299 0.765033    
## XV109 -0.0084769  0.0354586  -0.239 0.811148    
## XV110 -0.0850127  0.0622626  -1.365 0.172711    
## XV111  0.0942480  0.0804058   1.172 0.241663    
## XV112  0.0208755  0.0377386   0.553 0.580387    
## XV113  0.0298770  0.0831262   0.359 0.719426    
## XV114  0.0388840  0.0352849   1.102 0.270962    
## XV115 -0.0512062  0.1106006  -0.463 0.643567    
## XV116  0.0236428  0.0383706   0.616 0.538048    
## XV117 -0.0134025  0.0421644  -0.318 0.750715    
## XV118 -0.0472513  0.0626232  -0.755 0.450864    
## XV119 -0.0154219  0.0599635  -0.257 0.797134    
## XV120 -0.0332338  0.0795316  -0.418 0.676212    
## XV121  0.0506601  0.2591295   0.196 0.845076    
## XV122  0.0282014  0.0676167   0.417 0.676791    
## XV123 -0.0008363  0.2586277  -0.003 0.997421    
## XV124  0.0409017  0.1365702   0.299 0.764682    
## XV125 -0.0675549  0.0753642  -0.896 0.370458    
## XV126  0.1880013  0.0518512   3.626 0.000316 ***
## XV127  0.1199962  0.0701268   1.711 0.087643 .  
## XV128 -0.1192568  0.0575490  -2.072 0.038724 *  
## XV129 -0.0078032  0.0372667  -0.209 0.834226    
## XV130 -0.0169749  0.0436969  -0.388 0.697825    
## XV131  0.0255971  0.0341872   0.749 0.454349    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7642 on 529 degrees of freedom
## Multiple R-squared:  0.5305, Adjusted R-squared:  0.4151 
## F-statistic: 4.597 on 130 and 529 DF,  p-value: < 2.2e-16

beta.ols = fit.ols$coef
sig.ols  = summary(fit.ols)$sigma
se       = sig.ols*sqrt(diag(solve(t(X)%*%X)))
L        = beta.ols+qnorm(0.025)*se
U        = beta.ols+qnorm(0.975)*se
plot(beta.ols,pch=16,xlab="Regressor",ylab="Coefficient",main="OLS estimation",ylim=range(L,U),cex=0.5)
vars.ols = NULL
for (i in 1:k){
  if (L[i]<0 & U[i]>0){
    segments(i,L[i],i,U[i],col=2)
  }else{
    segments(i,L[i],i,U[i],lwd=2)
    text(i,5,i,cex=0.5)
    vars.ols = c(vars.ols,i)
  }
}
abline(h=0,lty=2)

Marginally significant variables

##  [1]   4  11  16  20  48  51  55  56  57  58  59  72  75  76  77  84  89  90  92
## [20] 125 127

Bayesian Gaussian linear regression

fit.bayes = bayeslm(y,X,prior="laplace",icept=FALSE,N=5000,burnin=1000,verb=FALSE)

## laplace prior 
## fixed running time 0.0160671
## sampling time 0.679016

qbeta = t(apply(fit.bayes$beta,2,quantile,c(0.025,0.5,0.975)))
plot(qbeta[,2],pch=16,xlab="Regressor",ylab="Coefficient",main="",ylim=range(qbeta),cex=0.5)
title("Bayesian estimation\n Laplace prior")
vars.bayes = NULL
for (i in 1:k){
  if (qbeta[i,1]<0 & qbeta[i,3]>0){
    segments(i,qbeta[i,1],i,qbeta[i,3],col=2)
  }else{
    segments(i,qbeta[i,1],i,qbeta[i,3],lwd=2)
    text(i,5,i,cex=0.5)
    vars.bayes = c(vars.bayes,i)
  }
}
abline(h=0,lty=2)

vars.bayes

## [1]  39  61 109 125

yhat.ols   = X%*%beta.ols
yhat.bayes = X%*%qbeta[,2]

MSE.ols    = mean((y-yhat.ols)^2)
MSE.bayes  = mean((y-yhat.bayes)^2)
MAE.ols    = mean(abs(y-yhat.ols))
MAE.bayes  = mean(abs(y-yhat.bayes))

tab = rbind(c(MSE.ols,MSE.bayes),c(MAE.ols,MAE.bayes))
rownames(tab) = c("MSE","MAE")
colnames(tab) = c("OLS","BAYES")
tab

##           OLS     BAYES
## MSE 0.4688303 0.5849677
## MAE 0.5139756 0.5488302

Out-of-sample exercise

Selecting the training and the testing samples

train      = sort(sample(1:n,size=n/2))
Xtrain     = X[train,]
Xtest      = X[-train,]
ytrain     = y[train]
ytest      = y[-train]

OLS and Bayes fit

fit.ols    = lm(ytrain~Xtrain-1)
beta.ols   = fit.ols$coef
fit.bayes  = bayeslm(ytrain,Xtrain,prior="laplace",icept=FALSE,N=5000,burnin=1000,verb=FALSE)
beta.bayes = apply(fit.bayes$beta,2,median)

MSE and MAE

yhat.ols   = Xtest%*%beta.ols
yhat.bayes = Xtest%*%qbeta[,2]
MSE.ols    = mean((ytest-yhat.ols)^2)
MSE.bayes  = mean((ytest-yhat.bayes)^2)
MAE.ols    = mean(abs(ytest-yhat.ols))
MAE.bayes  = mean(abs(ytest-yhat.bayes))

tab = rbind(c(MSE.ols,MSE.bayes),c(MAE.ols,MAE.bayes))
rownames(tab) = c("MSE","MAE")
colnames(tab) = c("OLS","BAYES")
tab

##           OLS     BAYES
## MSE 1.0577053 0.5496248
## MAE 0.7545287 0.5502646

Your task

Repeat the above out-of-sample exercise for 100 replications. Also, consider the reduced models derived by only retaining the significant variables according to the OLS fit and the Bayesian fit, i.e.

OLS variables: 4 11 16 20 48 51 55 56 57 58 59 72 75 76 77 84 89 90 92 125 127
Bayes variables: 32 39 61 109 125

Notice that you have 4 models to compare:

Full model, OLS fit
Full model, Bayesian fit
Reduced model, variables chosen via OLS
Reduced model, variables chosen via Bayes

HW4: Gaussian linear regression

Professional Master in Economics - INSPER

Prof. Hedibert Freitas Lopes

5/28/2024 (Due: 6/11/2024)