Introduction

Let us revisit the physicist exercise and consider only physicist one, whose prior about \(\theta\) (a physical quantity) is \(N(\theta_0,\tau_0^2)\).

theta0 = 900
tau0   = 20

Now, the goal of the physicist is to compare (or combine) two competing models for the measurement \(x\): \[\begin{eqnarray*} {\cal M}_0: \ x|\theta &\sim& N(\theta,\sigma_0^2)\\ {\cal M}_1: \ x|\gamma &\sim& t_{\nu}(\theta,\sigma_1^2), \end{eqnarray*}\] where \(\sigma_0^2\), \(\sigma_1^2\) and \(\nu\) are all known quantities, i.e. only the location parameter \(\theta\) is unknown. Under model \({\cal M}_0\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\theta) = \sigma_0^2. \] while under model \({\cal M}_1\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\gamma) = \frac{\nu}{\nu-2}\sigma_1^2. \] Both variances, \(V(x|\theta)\) and \(V(x|\gamma)\), are the same when \(\sigma_1^2=\sigma_0^2(\nu-2)/\nu\).

sigma0 = 40
nu     =  5
sigma1 = sigma0*sqrt((nu-2)/nu)
sigma1

## [1] 30.98387

Predictive densities

As in the class derivation, let us assume that \(x_{obs}=850\) is observed.
We are now ready to compare both prior predictive densities evaluated at \(x_{obs}\): \[ p(x_{obs}|{\cal M}_0) \ \ \ \mbox{and} \ \ \ p(x_{obs}|{\cal M}_1). \] Recall that \[ p(x_{obs}|{\cal M}_i) = \int_{-\infty}^{\infty} p(x|\theta,{\cal M}_i)p(\theta|{\cal M}_i)d\theta. \] Since we are using the same prior for both cases, we can derive the Bayes factor as \[ B_{01} = \frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} = \frac{\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_0)p(\theta)d\theta} {\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_1)p(\theta)d\theta}. \]

Computing \(B_{01}\) via Monte Carlo integration

If \(\theta^{(1)},\ldots,\theta^{(M)}\) is a random sample from the (common) prior, \(p(\theta)\), then a Monte Carlo approximation of \(B_{01}\) is \[ {\hat B}_{01} = \frac{{\hat p}(x_{obs}|{\cal M}_0)}{{\hat p}(x_{obs}|{\cal M}_1)} = \frac{\frac1M\sum_{i=1}^M p_N(x|\theta^{(i)},\sigma_0^2)} {\frac1M\sum_{i=1}^M p_t(x|\theta^{(i)},\sigma_1^2,\nu)}. \]

xobs    = 850
M       = 1000000
th.draw = rnorm(M,theta0,tau0)
pred0 = mean(dnorm(xobs,th.draw,sigma0))
pred1 = mean(dt((xobs-th.draw)/sigma1,df=nu)/sigma1)
B01  = pred0/pred1
c(round(pred0,9),round(pred1,9),round(B01,9))

## [1] 0.00477345 0.00422621 1.12948718

In fact, the numerator could have been obtained analytically: \[ {\hat p}(x_{obs}|{\cal M}_0) = p_N(x_{obs}|\theta_0,\sigma_0^2+\tau_0^2). \]

pred0.exact = dnorm(xobs,theta0,sqrt(sigma0^2+tau0^2))
B01.moreprecise = pred0.exact/pred1
c(round(pred0.exact,9),round(B01.moreprecise,9))

## [1] 0.004774864 1.129821783

How to read \(B_{10}\)?

Kass and Raftery (1995) argue that one can use \(B_{10}\) to qualify the evidence against \({\cal M}_0\). Their JASA paper can be found here: https://sites.stat.washington.edu/raftery/Research/PDF/kass1995.pdf

\(B_{10} \in (1,3):\) Not worth more than a bare mention
\(B_{10} \in (3,20):\) Positive evidence against \({\cal M}_0\)
\(B_{10} \in (20,150):\) Strong evidence against \({\cal M}_0\)
\(B_{10} > 150:\) Very strong evidence against \({\cal M}_0\)

Posterior model probabilities

Let us assume that the prior model probabilities for both models are \(\pi_0\) and \(\pi_1 = 1-\pi_0\), i.e. \[ Pr({\cal M}_0) = \pi_0 \ \ \ \mbox{and} \ \ \ Pr({\cal M}_0) = \pi_1 = 1 - \pi_0. \] Therefore, we can climb one step up in this ladder and compute \[ \frac{Pr({\cal M}_0|x_{obs})}{Pr({\cal M}_1|x_{obs})} = \frac{\frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)}{p(x_{obs})}} {\frac{Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)}{p(x_{obs})}} = \frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)} {Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}\times\frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}B_{01}. \] Since \(Pr({\cal M}_1|x_{obs})=1-Pr({\cal M}_0|x_{obs})\), we can rewrite the above ratio as \[ Pr({\cal M}_0|x_{obs}) = \frac{\frac{\pi_0}{1-\pi_0}B_{01}}{1+\frac{\pi_0}{1-\pi_0}B_{01}}= \frac{1}{1+\frac{1-\pi_0}{\pi_0}B_{10}}. \] If \(\pi_0=\pi_1\), then \[ Pr({\cal M}_0|x_{obs}) = \frac{1}{1+B_{10}} \longrightarrow 1 \ \ \mbox{when} \ \ B_{10} \longrightarrow 0 \ \ (p(x_{obs}|{\cal M}_1) << p(x_{obs}|{\cal M}_0)). \]

1/(1+1/B01)

## [1] 0.5304034

1/(1+1/B01.moreprecise)

## [1] 0.5304771

Posterior inference and Bayesian model averaging

In several practical cases, the decision maker couldn’t care less about which model is the best. What she/he might be more interested is the posterior probability for a given quantity, such as \(E(\theta|x_{obs})\) or \(Pr(\theta>900|x_{obs})\). Both above models produce estimates of these quantities: \[ E(\theta|{\cal M_i},x_{obs}) \ \ \ \mbox{and} \ \ \ Pr(\theta>900|{\cal M}_i,x_{obs}), \qquad i=0,1. \]

Gaussian model

For \({\cal M}_0\) these quantities are easily obtained in closed form. For instance, \[ E(\theta|{\cal M}_0,x_{obs}) = \left(\frac{\sigma_0^2}{\sigma_0^2+\tau_0^2}\right)\theta_0+ \left(\frac{\tau_0^2}{\sigma_0^2+\tau_0^2}\right)x_{obs} \]

Etheta0 = (sigma0^2)/(sigma0^2+tau0^2)*theta0 + (tau0^2)/(sigma0^2+tau0^2)*xobs
SDtheta0 = sqrt(tau0^2*(sigma0^2)/(sigma0^2+tau0^2))
tail0 = 1-pnorm(950,Etheta0,SDtheta0)

Student’s \(t\) model

The simplest way to obtain draws from the posterior of \(\theta\) in the Student’s \(t\) model is by implementing the sampling importance resampling scheme (as discussed in class). More specifically:

Sample \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\) from the prior \(N(\theta_0,\tau_0^2)\),
Compute resampling unnormalized weights \({\tilde w}^{(i)} = p_t(x_{obs}|{\tilde \theta}^{(i)},\sigma_1^2,\nu), \ \ \ i=1,\ldots,M,\)
Compute normalized weights \(w^{(i)}={\tilde w}^{(i)}/\sum_{j=1}^M {\tilde w}^{(j)}, \ \ \ i=1,\ldots,M\),
Sample with replacement \(N\) draws (equal to \(M\) or a fraction of \(M\), say 20%) from \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\), with weights \(\{w^{(1)},\ldots,w^{(i)}\}\),
For large \(M\) and \(N\), the resampled draws, which we will denote \(\{\theta^{(1)},\ldots,\theta^{(N)}\}\), behave as draws from the posterior distribution \(p(\theta|{\cal M}_1,x_{obs})\).

We are now ready to approximate, via Monte Carlo integration, the above two integrals as \[ \frac1N \sum_{i=1}^N \theta^{(i)} \ \ \ \mbox{and} \ \ \ \frac1N \sum_{i=1}^N 1(\theta^{(i)}>900). \]

M = 1000000
N = 0.1*M
theta.draw = rnorm(M,theta0,tau0)
w = dt((xobs-theta.draw)/sigma1,df=nu)/nu
w = w/sum(w)
theta1.draw = theta.draw[sample(1:M,size=N,replace=TRUE,prob=w)]

Etheta1 = mean(theta1.draw)
tail1   = mean(theta1.draw>900)

We can now compare these quantities from both models:

c(Etheta0,Etheta1)

## [1] 890.0000 886.9616

c(tail0,tail1)

## [1] 0.0003981151 0.2333700000

Bayesian model averaging

We can combine both models by weighing their posterior model probabilities, i.e. \(Pr({\cal M}_0|x_{obs})\) and \(Pr({\cal M}_1|x_{obs})\): \[ E(\theta|x_{obs}) = Pr({\cal M}_0|x_{obs})E(\theta|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})E(\theta|x_{obs},{\cal M}_1), \] and \[ Pr(\theta>950|x_{obs}) = Pr({\cal M}_0|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_1). \]

PrM0   = 1/(1+1/B01.moreprecise)
PrM1   = 1-PrM0
Etheta = Etheta0*PrM0+Etheta1*PrM1
tail   = tail0*PrM0+tail1*PrM1
Etheta

## [1] 888.5734

tail

## [1] 0.1097837

References

Kass and Raftery (1995) Bayes Factors. Journal of the American Statistical Association, Volume 90, Issue 430, Pages 773-795. 15809 citations. https://sites.stat.washington.edu/raftery/Research/PDF/kass1995.pdf
Hoeting, Madigan, Raftery and Volinsky (1999) Bayesian Model Averaging: A Tutorial. Statistical Science, Vol. 14, No. 4, 382-417. 4964 citations. https://www.stat.colostate.edu/~jah/papers/statsci.pdf
Fragoso, Bertoli and Louzada (2017) Bayesian Model Averaging: A Systematic Review and Conceptual Classification. International Statistical Review, 86(1), 1–28. doi:10.1111/insr.12243
Steel (2020) Model Averaging and Its Use in Economics. Journal of Economic Literature, 58(3), 644-719. https://www.aeaweb.org/articles/pdf/doi/10.1257/jel.20191385. ArXiv version: https://arxiv.org/pdf/1709.08221.pdf
BMA - R package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression). https://cran.r-project.org/web/packages/BMA/BMA.pdf

Bayes factor and Bayesian model averaging

Hedibert Freitas Lopes

5/11/2021