Let us revisit the physicist exercise and consider only physicist one, whose prior about \(\theta\) (a physical quantity) is \(N(\theta_0,\tau_0^2)\).
theta0 = 900
tau0 = 20
Now, the goal of the physicist is to compare (or combine) two competing models for the measurement \(x\): \[\begin{eqnarray*} {\cal M}_0: \ x|\theta &\sim& N(\theta,\sigma_0^2)\\ {\cal M}_1: \ x|\gamma &\sim& t_{\nu}(\theta,\sigma_1^2), \end{eqnarray*}\] where \(\sigma_0^2\), \(\sigma_1^2\) and \(\nu\) are all known quantities, i.e. only the location parameter \(\theta\) is unknown. Under model \({\cal M}_0\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\theta) = \sigma_0^2. \] while under model \({\cal M}_1\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\gamma) = \frac{\nu}{\nu-2}\sigma_1^2. \] Both variances, \(V(x|\theta)\) and \(V(x|\gamma)\), are the same when \(\sigma_1^2=\sigma_0^2(\nu-2)/\nu\).
sigma0 = 40
nu = 5
sigma1 = sigma0*sqrt((nu-2)/nu)
sigma1
## [1] 30.98387
As in the class derivation, let us assume that \(x_{obs}=850\) is observed.
We are now ready to compare both prior predictive densities evaluated at \(x_{obs}\): \[
p(x_{obs}|{\cal M}_0) \ \ \ \mbox{and} \ \ \ p(x_{obs}|{\cal M}_1).
\] Recall that \[
p(x_{obs}|{\cal M}_i) = \int_{-\infty}^{\infty} p(x|\theta,{\cal M}_i)p(\theta|{\cal M}_i)d\theta.
\] Since we are using the same prior for both cases, we can derive the Bayes factor as \[
B_{01} = \frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} =
\frac{\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_0)p(\theta)d\theta}
{\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_1)p(\theta)d\theta}.
\]
If \(\theta^{(1)},\ldots,\theta^{(M)}\) is a random sample from the (common) prior, \(p(\theta)\), then a Monte Carlo approximation of \(B_{01}\) is \[ {\hat B}_{01} = \frac{{\hat p}(x_{obs}|{\cal M}_0)}{{\hat p}(x_{obs}|{\cal M}_1)} = \frac{\frac1M\sum_{i=1}^M p_N(x|\theta^{(i)},\sigma_0^2)} {\frac1M\sum_{i=1}^M p_t(x|\theta^{(i)},\sigma_1^2,\nu)}. \]
xobs = 850
M = 1000000
th.draw = rnorm(M,theta0,tau0)
pred0 = mean(dnorm(xobs,th.draw,sigma0))
pred1 = mean(dt((xobs-th.draw)/sigma1,df=nu)/sigma1)
B01 = pred0/pred1
c(round(pred0,9),round(pred1,9),round(B01,9))
## [1] 0.00477345 0.00422621 1.12948718
In fact, the numerator could have been obtained analytically: \[ {\hat p}(x_{obs}|{\cal M}_0) = p_N(x_{obs}|\theta_0,\sigma_0^2+\tau_0^2). \]
pred0.exact = dnorm(xobs,theta0,sqrt(sigma0^2+tau0^2))
B01.moreprecise = pred0.exact/pred1
c(round(pred0.exact,9),round(B01.moreprecise,9))
## [1] 0.004774864 1.129821783
Kass and Raftery (1995) argue that one can use \(B_{10}\) to qualify the evidence against \({\cal M}_0\). Their JASA paper can be found here: https://sites.stat.washington.edu/raftery/Research/PDF/kass1995.pdf
\(B_{10} \in (1,3):\) Not worth more than a bare mention
\(B_{10} \in (3,20):\) Positive evidence against \({\cal M}_0\)
\(B_{10} \in (20,150):\) Strong evidence against \({\cal M}_0\)
\(B_{10} > 150:\) Very strong evidence against \({\cal M}_0\)
Let us assume that the prior model probabilities for both models are \(\pi_0\) and \(\pi_1 = 1-\pi_0\), i.e. \[ Pr({\cal M}_0) = \pi_0 \ \ \ \mbox{and} \ \ \ Pr({\cal M}_0) = \pi_1 = 1 - \pi_0. \] Therefore, we can climb one step up in this ladder and compute \[ \frac{Pr({\cal M}_0|x_{obs})}{Pr({\cal M}_1|x_{obs})} = \frac{\frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)}{p(x_{obs})}} {\frac{Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)}{p(x_{obs})}} = \frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)} {Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}\times\frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}B_{01}. \] Since \(Pr({\cal M}_1|x_{obs})=1-Pr({\cal M}_0|x_{obs})\), we can rewrite the above ratio as \[ Pr({\cal M}_0|x_{obs}) = \frac{\frac{\pi_0}{1-\pi_0}B_{01}}{1+\frac{\pi_0}{1-\pi_0}B_{01}}= \frac{1}{1+\frac{1-\pi_0}{\pi_0}B_{10}}. \] If \(\pi_0=\pi_1\), then \[ Pr({\cal M}_0|x_{obs}) = \frac{1}{1+B_{10}} \longrightarrow 1 \ \ \mbox{when} \ \ B_{10} \longrightarrow 0 \ \ (p(x_{obs}|{\cal M}_1) << p(x_{obs}|{\cal M}_0)). \]
1/(1+1/B01)
## [1] 0.5304034
1/(1+1/B01.moreprecise)
## [1] 0.5304771
In several practical cases, the decision maker couldn’t care less about which model is the best. What she/he might be more interested is the posterior probability for a given quantity, such as \(E(\theta|x_{obs})\) or \(Pr(\theta>900|x_{obs})\). Both above models produce estimates of these quantities: \[ E(\theta|{\cal M_i},x_{obs}) \ \ \ \mbox{and} \ \ \ Pr(\theta>900|{\cal M}_i,x_{obs}), \qquad i=0,1. \]
For \({\cal M}_0\) these quantities are easily obtained in closed form. For instance, \[ E(\theta|{\cal M}_0,x_{obs}) = \left(\frac{\sigma_0^2}{\sigma_0^2+\tau_0^2}\right)\theta_0+ \left(\frac{\tau_0^2}{\sigma_0^2+\tau_0^2}\right)x_{obs} \]
Etheta0 = (sigma0^2)/(sigma0^2+tau0^2)*theta0 + (tau0^2)/(sigma0^2+tau0^2)*xobs
SDtheta0 = sqrt(tau0^2*(sigma0^2)/(sigma0^2+tau0^2))
tail0 = 1-pnorm(950,Etheta0,SDtheta0)
The simplest way to obtain draws from the posterior of \(\theta\) in the Student’s \(t\) model is by implementing the sampling importance resampling scheme (as discussed in class). More specifically:
Sample \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\) from the prior \(N(\theta_0,\tau_0^2)\),
Compute resampling unnormalized weights \({\tilde w}^{(i)} = p_t(x_{obs}|{\tilde \theta}^{(i)},\sigma_1^2,\nu), \ \ \ i=1,\ldots,M,\)
Compute normalized weights \(w^{(i)}={\tilde w}^{(i)}/\sum_{j=1}^M {\tilde w}^{(j)}, \ \ \ i=1,\ldots,M\),
Sample with replacement \(N\) draws (equal to \(M\) or a fraction of \(M\), say 20%) from \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\), with weights \(\{w^{(1)},\ldots,w^{(i)}\}\),
For large \(M\) and \(N\), the resampled draws, which we will denote \(\{\theta^{(1)},\ldots,\theta^{(N)}\}\), behave as draws from the posterior distribution \(p(\theta|{\cal M}_1,x_{obs})\).
We are now ready to approximate, via Monte Carlo integration, the above two integrals as \[ \frac1N \sum_{i=1}^N \theta^{(i)} \ \ \ \mbox{and} \ \ \ \frac1N \sum_{i=1}^N 1(\theta^{(i)}>900). \]
M = 1000000
N = 0.1*M
theta.draw = rnorm(M,theta0,tau0)
w = dt((xobs-theta.draw)/sigma1,df=nu)/nu
w = w/sum(w)
theta1.draw = theta.draw[sample(1:M,size=N,replace=TRUE,prob=w)]
Etheta1 = mean(theta1.draw)
tail1 = mean(theta1.draw>900)
We can now compare these quantities from both models:
c(Etheta0,Etheta1)
## [1] 890.0000 886.9616
c(tail0,tail1)
## [1] 0.0003981151 0.2333700000
We can combine both models by weighing their posterior model probabilities, i.e. \(Pr({\cal M}_0|x_{obs})\) and \(Pr({\cal M}_1|x_{obs})\): \[ E(\theta|x_{obs}) = Pr({\cal M}_0|x_{obs})E(\theta|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})E(\theta|x_{obs},{\cal M}_1), \] and \[ Pr(\theta>950|x_{obs}) = Pr({\cal M}_0|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_1). \]
PrM0 = 1/(1+1/B01.moreprecise)
PrM1 = 1-PrM0
Etheta = Etheta0*PrM0+Etheta1*PrM1
tail = tail0*PrM0+tail1*PrM1
Etheta
## [1] 888.5734
tail
## [1] 0.1097837
Kass and Raftery (1995) Bayes Factors. Journal of the American Statistical Association, Volume 90, Issue 430, Pages 773-795. 15809 citations. https://sites.stat.washington.edu/raftery/Research/PDF/kass1995.pdf
Hoeting, Madigan, Raftery and Volinsky (1999) Bayesian Model Averaging: A Tutorial. Statistical Science, Vol. 14, No. 4, 382-417. 4964 citations. https://www.stat.colostate.edu/~jah/papers/statsci.pdf
Fragoso, Bertoli and Louzada (2017) Bayesian Model Averaging: A Systematic Review and Conceptual Classification. International Statistical Review, 86(1), 1–28. doi:10.1111/insr.12243
Steel (2020) Model Averaging and Its Use in Economics. Journal of Economic Literature, 58(3), 644-719. https://www.aeaweb.org/articles/pdf/doi/10.1257/jel.20191385. ArXiv version: https://arxiv.org/pdf/1709.08221.pdf
BMA - R package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression). https://cran.r-project.org/web/packages/BMA/BMA.pdf