Introduction

Let us revisit the physicist exercise and consider only physicist one, whose prior about \(\theta\) (a physical quantity) is \(N(\theta_0,\tau_0^2)\).

theta0 = 900
tau0   = 20

Now, the goal of the physicist is to compare (or combine) two competing models for the measurement \(x\): \[\begin{eqnarray*} {\cal M}_0: \ x|\theta &\sim& N(\theta,\sigma_0^2)\\ {\cal M}_1: \ x|\gamma &\sim& t_{\nu}(\theta,\sigma_1^2), \end{eqnarray*}\] where \(\sigma_0^2\), \(\sigma_1^2\) and \(\nu\) are all known quantities, i.e. only the location parameter \(\theta\) is unknown. Under model \({\cal M}_0\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\theta) = \sigma_0^2. \] while under model \({\cal M}_1\), \[ E(x|\theta) = \theta \ \ \ \mbox{and} \ \ \ V(x|\gamma) = \frac{\nu}{\nu-2}\sigma_1^2. \] Both variances, \(V(x|\theta)\) and \(V(x|\gamma)\), are the same when \(\sigma_1^2=\sigma_0^2(\nu-2)/\nu\).

sigma0 = 40
nu     =  5
sigma1 = sigma0*sqrt((nu-2)/nu)
sigma1
## [1] 30.98387

Predictive densities

As in the class derivation, let us assume that \(x_{obs}=850\) is observed.
We are now ready to compare both prior predictive densities evaluated at \(x_{obs}\): \[ p(x_{obs}|{\cal M}_0) \ \ \ \mbox{and} \ \ \ p(x_{obs}|{\cal M}_1). \] Recall that \[ p(x_{obs}|{\cal M}_i) = \int_{-\infty}^{\infty} p(x|\theta,{\cal M}_i)p(\theta|{\cal M}_i)d\theta. \] Since we are using the same prior for both cases, we can derive the Bayes factor as \[ B_{01} = \frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} = \frac{\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_0)p(\theta)d\theta} {\int_{-\infty}^{\infty} p(x|\theta,{\cal M}_1)p(\theta)d\theta}. \]

Computing \(B_{01}\) via Monte Carlo integration

If \(\theta^{(1)},\ldots,\theta^{(M)}\) is a random sample from the (common) prior, \(p(\theta)\), then a Monte Carlo approximation of \(B_{01}\) is \[ {\hat B}_{01} = \frac{{\hat p}(x_{obs}|{\cal M}_0)}{{\hat p}(x_{obs}|{\cal M}_1)} = \frac{\frac1M\sum_{i=1}^M p_N(x|\theta^{(i)},\sigma_0^2)} {\frac1M\sum_{i=1}^M p_t(x|\theta^{(i)},\sigma_1^2,\nu)}. \]

xobs    = 850
M       = 1000000
th.draw = rnorm(M,theta0,tau0)
pred0 = mean(dnorm(xobs,th.draw,sigma0))
pred1 = mean(dt((xobs-th.draw)/sigma1,df=nu)/sigma1)
B01  = pred0/pred1
c(round(pred0,9),round(pred1,9),round(B01,9))
## [1] 0.00477345 0.00422621 1.12948718

In fact, the numerator could have been obtained analytically: \[ {\hat p}(x_{obs}|{\cal M}_0) = p_N(x_{obs}|\theta_0,\sigma_0^2+\tau_0^2). \]

pred0.exact = dnorm(xobs,theta0,sqrt(sigma0^2+tau0^2))
B01.moreprecise = pred0.exact/pred1
c(round(pred0.exact,9),round(B01.moreprecise,9))
## [1] 0.004774864 1.129821783

How to read \(B_{10}\)?

Kass and Raftery (1995) argue that one can use \(B_{10}\) to qualify the evidence against \({\cal M}_0\). Their JASA paper can be found here: https://sites.stat.washington.edu/raftery/Research/PDF/kass1995.pdf

Posterior model probabilities

Let us assume that the prior model probabilities for both models are \(\pi_0\) and \(\pi_1 = 1-\pi_0\), i.e. \[ Pr({\cal M}_0) = \pi_0 \ \ \ \mbox{and} \ \ \ Pr({\cal M}_0) = \pi_1 = 1 - \pi_0. \] Therefore, we can climb one step up in this ladder and compute \[ \frac{Pr({\cal M}_0|x_{obs})}{Pr({\cal M}_1|x_{obs})} = \frac{\frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)}{p(x_{obs})}} {\frac{Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)}{p(x_{obs})}} = \frac{Pr({\cal M}_0)p(x_{obs}|{\cal M}_0)} {Pr({\cal M}_1)p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}\times\frac{p(x_{obs}|{\cal M}_0)}{p(x_{obs}|{\cal M}_1)} = \frac{\pi_0}{1-\pi_0}B_{01}. \] Since \(Pr({\cal M}_1|x_{obs})=1-Pr({\cal M}_0|x_{obs})\), we can rewrite the above ratio as \[ Pr({\cal M}_0|x_{obs}) = \frac{\frac{\pi_0}{1-\pi_0}B_{01}}{1+\frac{\pi_0}{1-\pi_0}B_{01}}= \frac{1}{1+\frac{1-\pi_0}{\pi_0}B_{10}}. \] If \(\pi_0=\pi_1\), then \[ Pr({\cal M}_0|x_{obs}) = \frac{1}{1+B_{10}} \longrightarrow 1 \ \ \mbox{when} \ \ B_{10} \longrightarrow 0 \ \ (p(x_{obs}|{\cal M}_1) << p(x_{obs}|{\cal M}_0)). \]

1/(1+1/B01)
## [1] 0.5304034
1/(1+1/B01.moreprecise)
## [1] 0.5304771

Posterior inference and Bayesian model averaging

In several practical cases, the decision maker couldn’t care less about which model is the best. What she/he might be more interested is the posterior probability for a given quantity, such as \(E(\theta|x_{obs})\) or \(Pr(\theta>900|x_{obs})\). Both above models produce estimates of these quantities: \[ E(\theta|{\cal M_i},x_{obs}) \ \ \ \mbox{and} \ \ \ Pr(\theta>900|{\cal M}_i,x_{obs}), \qquad i=0,1. \]

Gaussian model

For \({\cal M}_0\) these quantities are easily obtained in closed form. For instance, \[ E(\theta|{\cal M}_0,x_{obs}) = \left(\frac{\sigma_0^2}{\sigma_0^2+\tau_0^2}\right)\theta_0+ \left(\frac{\tau_0^2}{\sigma_0^2+\tau_0^2}\right)x_{obs} \]

Etheta0 = (sigma0^2)/(sigma0^2+tau0^2)*theta0 + (tau0^2)/(sigma0^2+tau0^2)*xobs
SDtheta0 = sqrt(tau0^2*(sigma0^2)/(sigma0^2+tau0^2))
tail0 = 1-pnorm(950,Etheta0,SDtheta0)

Student’s \(t\) model

The simplest way to obtain draws from the posterior of \(\theta\) in the Student’s \(t\) model is by implementing the sampling importance resampling scheme (as discussed in class). More specifically:

  • Sample \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\) from the prior \(N(\theta_0,\tau_0^2)\),

  • Compute resampling unnormalized weights \({\tilde w}^{(i)} = p_t(x_{obs}|{\tilde \theta}^{(i)},\sigma_1^2,\nu), \ \ \ i=1,\ldots,M,\)

  • Compute normalized weights \(w^{(i)}={\tilde w}^{(i)}/\sum_{j=1}^M {\tilde w}^{(j)}, \ \ \ i=1,\ldots,M\),

  • Sample with replacement \(N\) draws (equal to \(M\) or a fraction of \(M\), say 20%) from \(\{{\tilde \theta}^{(1)},\ldots,{\tilde \theta}^{(M)}\}\), with weights \(\{w^{(1)},\ldots,w^{(i)}\}\),

  • For large \(M\) and \(N\), the resampled draws, which we will denote \(\{\theta^{(1)},\ldots,\theta^{(N)}\}\), behave as draws from the posterior distribution \(p(\theta|{\cal M}_1,x_{obs})\).

We are now ready to approximate, via Monte Carlo integration, the above two integrals as \[ \frac1N \sum_{i=1}^N \theta^{(i)} \ \ \ \mbox{and} \ \ \ \frac1N \sum_{i=1}^N 1(\theta^{(i)}>900). \]

M = 1000000
N = 0.1*M
theta.draw = rnorm(M,theta0,tau0)
w = dt((xobs-theta.draw)/sigma1,df=nu)/nu
w = w/sum(w)
theta1.draw = theta.draw[sample(1:M,size=N,replace=TRUE,prob=w)]

Etheta1 = mean(theta1.draw)
tail1   = mean(theta1.draw>900)

We can now compare these quantities from both models:

c(Etheta0,Etheta1)
## [1] 890.0000 886.9616
c(tail0,tail1)
## [1] 0.0003981151 0.2333700000

Bayesian model averaging

We can combine both models by weighing their posterior model probabilities, i.e.  \(Pr({\cal M}_0|x_{obs})\) and \(Pr({\cal M}_1|x_{obs})\): \[ E(\theta|x_{obs}) = Pr({\cal M}_0|x_{obs})E(\theta|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})E(\theta|x_{obs},{\cal M}_1), \] and \[ Pr(\theta>950|x_{obs}) = Pr({\cal M}_0|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_0)+Pr({\cal M}_1|x_{obs})Pr(\theta>950|x_{obs},{\cal M}_1). \]

PrM0   = 1/(1+1/B01.moreprecise)
PrM1   = 1-PrM0
Etheta = Etheta0*PrM0+Etheta1*PrM1
tail   = tail0*PrM0+tail1*PrM1
Etheta
## [1] 888.5734
tail
## [1] 0.1097837

References