We will illustrate the use of Monte Carlo integration via a simple normal-normal example where observations are iid normal, ie. \[ y_1,\ldots,y_n|\theta \sim N(\theta,1) \] and the prior information regarding \(\theta\) is \[ \theta \sim N(0,1) \]
set.seed(23235)
n = 10
thetatrue = 1
y = rnorm(n,thetatrue,1)
ybar = mean(y)
ybar
## [1] 1.350405
In this simple case, all computations can be done in analytical form. More precisely, it is easy to see that the sampling distribution of the sample mean \({\bar y}_n=\sum_{i=1}^n y_i\), given \(\theta\), is \[ {\bar y}_n|\theta \sim N(\theta,1/n) \] while the prior predictive of \({\bar y}_n\) is \[ {\bar y}_n \sim N\left(0,\frac{n+1}{n}\right) \] Similarly, the posterior distribution of \(\theta\) given \({\bar y}_n\) is also Gaussian: \[ \theta | {\bar y}_n \sim N\left(\frac{n}{n+1}{\bar y}_n,\frac{1}{n+1}\right) \] and the posterior predictive of \(y_{n+1}\) is \[ p(y_{n+1}|{\bar y}_n) \sim N\left(\frac{n}{n+1}{\bar y}_n,\frac{n+2}{n+1}\right) \]
par(mfrow=c(1,1))
yy = seq(-5,5,length=100)
plot(yy,dnorm(yy,0,sqrt((n+1)/n)),type="l",ylab="Density",xlab=expression(bar(y)))
legend("topleft",legend=c("Model",expression(y[i]~"|"~theta %~%N(theta,1)~i==list(1,...,n)),expression(bar(y)%~%N(0,1/n))))
legend("topright",legend=c("Prior",expression(theta %~%N(0,1))))
title(substitute(paste("Prior predictive of ",bar(y)," (n = ",n,")",sep=""),list(n=n)))
points(ybar,0,col=2,pch=16,cex=1)
m = n/(n+1)*ybar
sd = sqrt(1/(n+1))
tt = seq(m-5*sd,m+5*sd,length=100)
plot(tt,dnorm(tt,m,sd),xlab=expression(theta),ylab="Density",type="l")
points(thetatrue,0,col=2,pch=16,cex=1)
title(expression('Posterior of ' ~ theta))
m = n/(n+1)*ybar
sd = sqrt((n+2)/(n+1))
yy1 = seq(m-10*sd,m+10*sd,length=1000)
plot(yy1,dnorm(yy1,m,sd),xlab=expression(y[n+1]),ylab="Density",type="l")
lines(yy1,dnorm(yy1,0,2),col=2)
legend("topleft",legend=c(expression(Pr(y[n+1])),expression(Pr(y[n+1]~"|"~list(y[1],...,y[n])))),col=2:1,lty=1)
legend("topright",legend=substitute(paste(bar(y)==yyy),list(yyy=round(ybar,3))))
Here we use simple Monte Carlo integration to obtain the value of the prior predictive density evaluate at the observed value of \({\bar y}_n\). \[ p({\bar y}_{n,obs}) = \int f_N({\bar y}_{n,obs};\theta,1/n)f_N(\theta;0,1)d\theta = f_N\left({\bar y}_{n,obs};0,\frac{n+1}{n}\right), \] where \(f_N(x;\mu,\sigma^2)\) is the density of the Gaussian (normal) with mean \(\mu\) and variance \(\sigma^2\) when evaluated at the point \(x\).
pybar = dnorm(ybar,0,sqrt((n+1)/n))
pybar
## [1] 0.1660445
A simple Monte Carlo (MC) estimate of \(p({\bar y}_{n,obs})\) is obtained by noticing that the above integral is nothing but the expectation of the Gaussian model with respect to the prior: \[ p({\bar y}_{n,obs}) = E_{\theta}\{p({\bar y}_{n,obs}|\theta)\}. \] Therefore, when \(\{\theta^{(i)}\}_{i=1}^M\) is a random sample from the prior \(\pi(\theta)\), it follows from standard sampling/frequentist inferential results (laws of large numbers and central limit theorems) that: \[ {\hat p}_{MC}({\bar y}_{n,obs}) = \frac{1}{M} \sum_{i=1}^M f_N({\bar y}_{n,obs};\theta^{(i)},1/n) \longrightarrow p({\bar y}_{n,obs}) \] when \(M \rightarrow \infty\).
set.seed(1235)
M = 1000
thetaprior = rnorm(M,0,1)
pybar.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))/1:M
plot(1:M,pybar.hat1,xlab="MC size",ylab="Estimate",type="l",col=grey(0.75))
for (i in 1:100){
thetaprior = rnorm(M,0,1)
pybar.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))/1:M
lines(1:M,pybar.hat1,col=grey(0.75))
}
abline(h=pybar,lty=2)
title(substitute(paste("Prior predictive evaluate at ",bar(y)==yyy),list(yyy=round(ybar,3))))
where, again, \(f_N(x;\mu,\sigma^2)\) is the density of the Gaussian (normal) with mean \(\mu\) and variance \(\sigma^2\) when evaluated at the point \(x\).
y0 = 2
m = n/(n+1)*ybar
sd = sqrt((n+2)/(n+1))
py0 = dnorm(y0,m,sd)
py0
## [1] 0.2905853
As before, when \(\{\theta^{(i)}\}_{i=1}^M\) is a random sample from the prior \(\pi(\theta)\), it follows that \[ {\hat p}_{MC}^1(y_0|{\bar y}_n) = \frac{\frac{1}{M} \sum_{i=1}^M f_N(y_0;\theta^{(i)},1)f_N({\bar y}_n;\theta^{(i)},1/n)} {\frac{1}{M} \sum_{j=1}^M f_N({\bar y}_n;\theta^{(j)},1/n)} = \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) w^{(i)}, \] where \[ w^{(i)} = \frac{f_N({\bar y}_n;\theta^{(i)},1/n)}{\sum_{j=1}^N f_N({\bar y}_n;\theta^{(j)},1/n)}. \] In words, the MC estimate of \(p(y_0|{\bar y}_n)\) is a weighed average of the likelihood of the new value \(y_0\).
set.seed(2135)
thetaprior = rnorm(M,0,1)
py0.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n))*dnorm(y0,thetaprior,1))/
cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))
ts.plot(py0.hat1,ylim=c(0.25,0.35),col=grey(0.75),xlab="Monte Carlo size",ylab="Estimate")
for (i in 1:100){
thetaprior = rnorm(M,0,1)
py0.hat1 = cumsum(dnorm(ybar,thetaprior,sqrt(1/n))*dnorm(y0,thetaprior,1))/
cumsum(dnorm(ybar,thetaprior,sqrt(1/n)))
lines(py0.hat1,col=grey(0.75))
}
abline(h=py0,lty=2)
title(substitute(paste("Posterior predictive evaluate at ",y[0]==yyy),list(yyy=y0)))
legend("topright",legend="Based on prior draws")
A more precise MC approximation can be obtained by using the fact that \[ p(y_0|{\bar y}_n) = \int p(y_0|\theta)\pi(\theta|{\bar y}_n)d\theta, \] which, therefore, can be approximated by \[ {\hat p}_{MC}^2(y_0|{\bar y}_n) = \frac{1}{M} \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) = \sum_{i=1}^M f_N(y_0;\theta^{(i)},1) w^{(i)}, \] where \(\{\theta^{(i)}\}_{i=1}^M\) are posterior draws from \(\pi(\theta|{\bar y}_n)\) and uniform weights \(w^{(i)}=1/M\). In words, both estimates \({\hat p}_{MC}^1\) and \({\hat p}_{MC}^2\) average the likelihood of a new observation \(y_0\), \(f_N(y_0;\theta^{(i)},1)\), with draws from different distributions and, consequently, with different weights.
m = n/(n+1)*ybar
sd = sqrt(1/(n+1))
set.seed(1234)
thetapost = rnorm(M,m,sd)
py0.hat2 = cumsum(dnorm(y0,thetapost,1))/1:M
ts.plot(py0.hat2,ylim=c(0.25,0.35),col=grey(0.75),xlab="Monte Carlo size",ylab="Estimate")
for (i in 1:100){
thetapost = rnorm(M,m,sd)
py0.hat2 = cumsum(dnorm(y0,thetapost,1))/1:M
lines(py0.hat2,col=grey(0.75))
}
abline(h=py0,lty=2)
title(substitute(paste("Posterior predictive evaluate at ",y[0]==yyy),list(yyy=y0)))
legend("topright",legend="Based on posterior draws")
Both \({\hat p}_{MC}^1\) and \({\hat p}_{MC}^2\) are perfectly valid estimates of \(p(y_0|{\bar y}_n)\). What we prove (only empirically here!) is that the variance of \({\hat p}_{MC}^1\) is greater than the variance of \({\hat p}_{MC}^2\). This happens because \({\hat p}_{MC}^2\) goes one step further in the Monte Carlo integration and analytically obtain the posterior (or samples from the posterior) and by doing so decreases the size of the error of the MC approximation.