Model

We will illustrate the use of Monte Carlo integration via a simple normal-normal example where observations are iid normal, ie. \[ y_1,\ldots,y_n|\theta \sim N(\theta,1) \] and the prior information regarding \(\theta\) is \[ \theta \sim N(0,1) \]

Simulating the dataset

set.seed(23235)
n = 10 
thetatrue = 1
y = rnorm(n,thetatrue,1)
ybar = mean(y)
ybar
## [1] 1.350405

In this simple case, all computations can be done in analytical form. More precisely, it is easy to see that the sampling distribution of the sample mean \({\bar y}_n=\sum_{i=1}^n y_i\), given \(\theta\), is \[ {\bar y}_n|\theta \sim N(\theta,1/n) \] while the prior predictive of \({\bar y}_n\) is \[ {\bar y}_n \sim N\left(0,\frac{n+1}{n}\right) \] Similarly, the posterior distribution of \(\theta\) given \({\bar y}_n\) is also Gaussian: \[ \theta | {\bar y}_n \sim N\left(\frac{n}{n+1}{\bar y}_n,\frac{1}{n+1}\right) \] and the posterior predictive of \(y_{n+1}\) is \[ p(y_{n+1}|{\bar y}_n) \sim N\left(\frac{n}{n+1}{\bar y}_n,\frac{n+2}{n+1}\right) \]

Prior predictive of \({\bar y}_n\)

par(mfrow=c(1,1))
yy = seq(-5,5,length=100)
plot(yy,dnorm(yy,0,sqrt((n+1)/n)),type="l",ylab="Density",xlab=expression(bar(y)))
legend("topleft",legend=c("Model",expression(y[i]~"|"~theta %~%N(theta,1)~i==list(1,...,n)),expression(bar(y)%~%N(0,1/n))))
legend("topright",legend=c("Prior",expression(theta %~%N(0,1))))
title(substitute(paste("Prior predictive of ",bar(y)," (n = ",n,")",sep=""),list(n=n)))
points(ybar,0,col=2,pch=16,cex=1)

Posterior of \(\theta\)

m = n/(n+1)*ybar
sd = sqrt(1/(n+1))
tt = seq(m-5*sd,m+5*sd,length=100)
plot(tt,dnorm(tt,m,sd),xlab=expression(theta),ylab="Density",type="l")
points(thetatrue,0,col=2,pch=16,cex=1)
title(expression('Posterior of ' ~ theta))

Prior and posterior predictives

m = n/(n+1)*ybar
sd = sqrt((n+2)/(n+1))
yy1 = seq(m-10*sd,m+10*sd,length=1000)
plot(yy1,dnorm(yy1,m,sd),xlab=expression(y[n+1]),ylab="Density",type="l")
lines(yy1,dnorm(yy1,0,2),col=2)
legend("topleft",legend=c(expression(Pr(y[n+1])),expression(Pr(y[n+1]~"|"~list(y[1],...,y[n])))),col=2:1,lty=1)
legend("topright",legend=substitute(paste(bar(y)==yyy),list(yyy=round(ybar,3))))

Prior predictive evaluated at \({\bar y}_n\)

Here we use simple Monte Carlo integration to obtain the value of the prior predictive density evaluate at the observed value of \({\bar y}_n\). \[ p({\bar y}_{n,obs}) = \int f_N({\bar y}_{n,obs};\theta,1/n)f_N(\theta;0,1)d\theta = f_N\left({\bar y}_{n,obs};0,\frac{n+1}{n}\right), \] where \(f_N(x;\mu,\sigma^2)\) is the density of the Gaussian (normal) with mean \(\mu\) and variance \(\sigma^2\) when evaluated at the point \(x\).

pybar = dnorm(ybar,0,sqrt((n+1)/n))
pybar
## [1] 0.1660445

Simple Monte Carlo estimation

A simple Monte Carlo (MC) estimate of \(p({\bar y}_{n,obs})\) is obtained by noticing that the above integral is nothing but the expectation of the Gaussian model with respect to the prior: \[ p({\bar y}_{n,obs}) = E_{\theta}\{p({\bar y}_{n,obs}|\theta)\}. \] Therefore, when \(\{\theta^{(i)}\}_{i=1}^M\) is a random sample from the prior \(\pi(\theta)\), it follows from standard sampling/frequentist inferential results (laws of large numbers and central limit theorems) that: \[ {\hat p}_{MC}({\bar y}_{n,obs}) = \frac{1}{M} \sum_{i=1}^M f_N({\bar y}_{n,obs};\theta^{(i)},1/n) \longrightarrow p({\bar y}_{n,obs}) \] when \(M \rightarrow \infty\).

set.seed(1235)
M = 1000
thetaprior = rnorm(M,0,1)
pybar.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))/1:M
plot(1:M,pybar.hat1,xlab="MC size",ylab="Estimate",type="l",col=grey(0.75))
for (i in 1:100){
  thetaprior = rnorm(M,0,1)
  pybar.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))/1:M
  lines(1:M,pybar.hat1,col=grey(0.75))
 }
abline(h=pybar,lty=2)
title(substitute(paste("Prior predictive evaluate at ",bar(y)==yyy),list(yyy=round(ybar,3))))

Evaluating the posterior predictive at a future value \(y_{n+1}=y_0\)

The posterior predictive can be obtained as \[\begin{eqnarray*} p(y_0|{\bar y}_n) &=& \int p(y_0|\theta)\pi(\theta|{\bar y}_n)d\theta\\ &=& \int p(y_0|\theta)\frac{p({\bar y}_n|\theta)\pi(\theta)}{p({\bar y}_n)}d\theta\\ &=& \frac{\int p(y_0|\theta)p({\bar y}_n|\theta)\pi(\theta)d\theta} {\int p({\bar y}_n|\theta)\pi(\theta)d\theta} \\ &=& \frac{E_\theta[p(y_0|\theta)p({\bar y}_n|\theta)]}{E_\theta[p({\bar y}_n|\theta)]}\\ &=& f_N\left(y_0;\frac{n}{n+1}{\bar y}_n,\frac{n+2}{n+1}\right), \end{eqnarray*}\]

where, again, \(f_N(x;\mu,\sigma^2)\) is the density of the Gaussian (normal) with mean \(\mu\) and variance \(\sigma^2\) when evaluated at the point \(x\).

y0 = 2
m = n/(n+1)*ybar
sd = sqrt((n+2)/(n+1))
py0 = dnorm(y0,m,sd)
py0
## [1] 0.2905853

MC estimate via prior draws

As before, when \(\{\theta^{(i)}\}_{i=1}^M\) is a random sample from the prior \(\pi(\theta)\), it follows that \[ {\hat p}_{MC}^1(y_0|{\bar y}_n) = \frac{\frac{1}{M} \sum_{i=1}^M f_N(y_0;\theta^{(i)},1)f_N({\bar y}_n;\theta^{(i)},1/n)} {\frac{1}{M} \sum_{j=1}^M f_N({\bar y}_n;\theta^{(j)},1/n)} = \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) w^{(i)}, \] where \[ w^{(i)} = \frac{f_N({\bar y}_n;\theta^{(i)},1/n)}{\sum_{j=1}^N f_N({\bar y}_n;\theta^{(j)},1/n)}. \] In words, the MC estimate of \(p(y_0|{\bar y}_n)\) is a weighed average of the likelihood of the new value \(y_0\).

set.seed(2135)
thetaprior = rnorm(M,0,1)
py0.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n))*dnorm(y0,thetaprior,1))/
cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))
ts.plot(py0.hat1,ylim=c(0.25,0.35),col=grey(0.75),xlab="Monte Carlo size",ylab="Estimate")
for (i in 1:100){
  thetaprior = rnorm(M,0,1)  
  py0.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n))*dnorm(y0,thetaprior,1))/
                    cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))
  lines(py0.hat1,col=grey(0.75))
}
abline(h=py0,lty=2)
title(substitute(paste("Posterior predictive evaluate at ",y[0]==yyy),list(yyy=y0)))
legend("topright",legend="Based on prior draws")

MC estimate via posterior draws

A more precise MC approximation can be obtained by using the fact that \[ p(y_0|{\bar y}_n) = \int p(y_0|\theta)\pi(\theta|{\bar y}_n)d\theta, \] which, therefore, can be approximated by \[ {\hat p}_{MC}^2(y_0|{\bar y}_n) = \frac{1}{M} \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) = \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) w^{(i)}, \] where \(\{\theta^{(i)}\}_{i=1}^M\) are posterior draws from \(\pi(\theta|{\bar y}_n)\) and uniform weights \(w^{(i)}=1/M\). In words, both estimates \({\hat p}_{MC}^1\) and \({\hat p}_{MC}^2\) average the likelihood of a new observation \(y_0\), \(f_N(y_0;\theta^{(i)},1)\), with draws from different distributions and, consequently, with different weights.

m = n/(n+1)*ybar
sd = sqrt(1/(n+1))
set.seed(1234)
thetapost = rnorm(M,m,sd)
py0.hat2 = cumsum(dnorm(y0,thetapost,1))/1:M
ts.plot(py0.hat2,ylim=c(0.25,0.35),col=grey(0.75),xlab="Monte Carlo size",ylab="Estimate")
for (i in 1:100){
  thetapost = rnorm(M,m,sd)
  py0.hat2 = cumsum(dnorm(y0,thetapost,1))/1:M
  lines(py0.hat2,col=grey(0.75))
}
abline(h=py0,lty=2)
title(substitute(paste("Posterior predictive evaluate at ",y[0]==yyy),list(yyy=y0)))
legend("topright",legend="Based on posterior draws")

Conclusion

Both \({\hat p}_{MC}^1\) and \({\hat p}_{MC}^2\) are perfectly valid estimates of \(p(y_0|{\bar y}_n)\). What we prove (only empirically here!) is that the variance of \({\hat p}_{MC}^1\) is greater than the variance of \({\hat p}_{MC}^2\). This happens because \({\hat p}_{MC}^2\) goes one step further in the Monte Carlo integration and analytically obtain the posterior (or samples from the posterior) and by doing so decreases the size of the error of the MC approximation.