Gaussian data and Gaussian prior

Let us consider an extremely simple and well known problem in order to illustrate (once again!) that Monte Carlo assisted posterior inference (approximated posterior inference) is a computational accessory for computing integrals and/or sampling from unknown distributions. That being said, only use MC-based approximations only when necessary and at your own risk!

The problem

We want to learn about the posterior of \(\theta\) when a bunch of observations \(x_1,\ldots,x_n\) is assumed arriving from \(N(\theta,1)\) and that our initial information about \(\theta\) is represented by the prior \(\theta \sim N(0,10)\). Here is the observed data:

n = 5
x = c(3,3,4,5,5)

Exact posterior inference

It is easy to show that \(\theta|x_1,\ldots,x_n \sim N(\theta_1,\tau_1^2)\):

xbar   = mean(x)
theta1 = (10/(1/n+10))*xbar
tau12 = 10*(1/n)/(1/n+10)
curve(dnorm(x,0,sqrt(10)),from=-10,to=10,n=1000,col=2,lwd=2,
      ylab="Density",xlab=expression(theta),ylim=c(0,1))
curve(dnorm(x,theta1,sqrt(tau12)),from=-3,to=7,n=1000,lwd=2,add=TRUE)
curve(dnorm(x,xbar,sqrt(1/n)),from=-3,to=7,n=1000,col=4,lwd=2,add=TRUE)
legend("topleft",legend=c("Prior","Likelihood","Posterior"),col=c(2,4,1),
        bty="n",lty=1,lwd=2)

Exact posterior quantities

The following posterior summaries can easily be computed analytically:

  • A. \(Pr(\theta>4.5|x_1,...,x_5)\)

  • B. \(E(\theta|x_1,...,x_5)\)

  • C. \(Pr(\theta<L|x_1,...,x_5) = 0.025\)

  • D. \(Pr(\theta<U|x_1,...,x_5) = 0.975\)

A = 1-pnorm(4.5,theta1,sqrt(tau12))
B = theta1
C = qnorm(0.025,theta1,sqrt(tau12))
D = qnorm(0.975,theta1,sqrt(tau12))

summary = c(A,B,C,D)
summary
## [1] 0.09572835 3.92156863 3.05368199 4.78945527

What if we were not capable to figure out \(N(\theta_1,\tau_1^2)\)?

Below we consider two possible proposal distributions:

likelihood = function(theta){
  exp(-0.5*(n*theta^2-2*theta*n*xbar))
}
prior = function(theta){
  dnorm(theta,0,sqrt(10))
}

M = 10000
N = 10000

# SIR: Prior as proposal
theta.draw1 = rnorm(M,0,sqrt(10))
w           = likelihood(theta.draw1)*prior(theta.draw1)/prior(theta.draw1)
theta.post1 = sample(theta.draw1,size=N,replace=TRUE,prob=w)

# SIR: Uniform proposal in the range (0,10)
theta.draw2 = runif(M,0,10)
w           = likelihood(theta.draw2)*prior(theta.draw2)/(1/10)
theta.post2 = sample(theta.draw2,size=N,replace=TRUE,prob=w)
par(mfrow=c(2,2))
hist(theta.draw1,prob=TRUE,xlab="theta",main="Proposal draws\nProposal equal to prior",xlim=c(-20,20))
hist(theta.post1,prob=TRUE,xlab="theta",main="Posterior draws",xlim=c(0,10))
hist(theta.draw2,prob=TRUE,xlab="theta",main="Proposal draws\nProposal NOT equal to prior",xlim=c(-20,20))
hist(theta.post2,prob=TRUE,xlab="theta",main="Posterior draws",xlim=c(0,10))

summary1 = c(mean(theta.post1>4.5),
mean(theta.post1),
quantile(theta.post1,0.025),
quantile(theta.post1,0.975))


summary2 = c(mean(theta.post2>4.5),
mean(theta.post2),
quantile(theta.post2,0.025),
quantile(theta.post2,0.975))

cbind(summary,summary1,summary2)
##          summary summary1 summary2
##       0.09572835 0.098900 0.093100
##       3.92156863 3.922143 3.909588
## 2.5%  3.05368199 3.037751 3.061838
## 97.5% 4.78945527 4.783933 4.779446

VIP (Very important point): The Bayesian inference is exact (first column of the above matrix) and no MC approximation is needed. More precisely, each one of the above 4 quantities are obtained analytically/exactly (in machine precision, of course!). The main reason to enphasiae this is the fact that ALL computating that comes attached to the 2nd and 3rd columns of the above matrix (i.e. MC approximations) ONLY exist because we where not able to obtain these quantities in closed form. This second step (obtaining quantities from the posterior) is purely computational and NEVER conceptual! The conceptual Bayesian machinary is implemented repeatedly always, invariably, following the same set of rules. As an example, we know that \[ E(\theta|x_1,\ldots,x_5) = \theta_1 = \frac{\int_{-\infty}^{\infty} \theta \left\{\prod_{i=1}^5 p_n(x_i|\theta,1)\right\}p_n(\theta|0,10)d\theta}{\int_{-\infty}^{\infty}\left\{\prod_{i=1}^5 p_n(x_i|\theta,1)\right\}p_n(\theta|0,10)d\theta} = \frac{10}{10+\frac{1}{n}} \times \frac{x_1+x_2+x_3+x_4+x_5}{5}, \] which is a function of \(n\) and the 5 values of \(x_i\). Now, if the normal model, \(x_i|\theta ~ N(\theta,1)\), was replaced by a Student’s \(t\) model, \(x_i|theta \sim t_4(0,1)\), then \[ E(\theta|x_1,\ldots,x_5) = \frac{\int_{-\infty}^{\infty} \theta \left\{\prod_{i=1}^5 p_{t_4}(x_i|\theta,1)\right\}p_n(\theta|0,10)d\theta}{\int_{-\infty}^{\infty}\left\{\prod_{i=1}^5 p_{t_4}(x_i|\theta,1)\right\}p_n(\theta|0,10)d\theta}, \] which is the ratio of two integrals, both of which we are not able to solve analytically. Again, we were able to solve this ratio analytically in the Normal-Normal case, but not in the Student’s t-Normal case and only because of that impossibility is that we are using Monte Carlo methods (Monte Carlo integration and simulation), such as the sampling importance resampling (SIR) algorihtm.

What is the size of the MC approximation error?

Rep      = 100
summary1 = matrix(0,Rep,4)
summary2 = matrix(0,Rep,4)
for (rep in 1:Rep){
theta.draw1 = rnorm(M,0,sqrt(10))
w           = likelihood(theta.draw1)*prior(theta.draw1)/prior(theta.draw1)
theta.post1 = sample(theta.draw1,size=N,replace=TRUE,prob=w)
theta.draw2 = runif(M,0,10)
w           = likelihood(theta.draw2)*prior(theta.draw2)/(1/10)
theta.post2 = sample(theta.draw2,size=N,replace=TRUE,prob=w)
summary1[rep,] = c(mean(theta.post1>4.5),
mean(theta.post1),
quantile(theta.post1,0.025),
quantile(theta.post1,0.975))
summary2[rep,] = c(mean(theta.post2>4.5),
mean(theta.post2),
quantile(theta.post2,0.025),
quantile(theta.post2,0.975))
}



names = c("A. Pr(theta>4.5|x1,...,x5)","B. E(theta|x1,...,x5)",
"C. Pr(theta<L|x1,...,x5) = 0.025","D. Pr(theta<U|x1,...,x5) = 0.975")
par(mfrow=c(2,2))
for (i in 1:4){
  boxplot(summary1[,i],summary2[,i],names=c("Proposal 1","Proposal 2"),outline=FALSE)
  title(names[i])
  abline(h=summary[i],col=2)
}