We observe \(n\) pairs of observations, \((y_i,x_i)\), such that \[ \mbox{data} = \{(y_1,x_1),\ldots,(y_n,x_n)\}. \] The model that relates \(y\)s and the \(x\)s is a Gaussian homoscedastic linear model: \[ y_i = \beta x_i + \varepsilon_i \qquad \varepsilon_i \sim N(0,\sigma^2), \] with \(E(\varepsilon_i\varepsilon_j)=0\) for all \(i \neq j\). The likelihood function can be written as \[\begin{eqnarray*} L(\beta,\sigma^2|\mbox{data}) &=& \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac12\frac{(y_i-\beta x_i)^2}{\sigma^2}\right\}\\ &\propto& (\sigma^2)^{-n/2}\exp\left\{-\frac12\frac{\sum_{i=1}^n (y_i-\beta x_i)^2}{\sigma^2}\right\} \end{eqnarray*}\]
In this model, the two unknown quantities are the slope parameter \(\beta\) and the residual variance, \(\sigma^2\). We will assume conditionally conjugate priors as follows: \[ \beta \sim N(b_0,V_0) \ \ \ \mbox{and} \ \ \ \sigma^2 \sim IG(c_0,d_0), \] such that \[ p(\beta,\sigma^2) = p(\beta)p(\sigma^2) = \left[\frac{1}{\sqrt{2\pi V_0}}\exp\left\{-0.5\frac{(\beta-b_0)^2}{V_0}\right\}\right] \left[ \frac{d_0^{c_0}}{\Gamma(c_0)}(\sigma^2)^{-(c_0+1)}\exp\left\{-\frac{d_0}{\sigma^2}\right\}\right]. \]
The posterior distribution of \((\beta,\sigma^2)\) \[ p(\beta,\sigma^2|\mbox{data}) \propto p(\beta)p(\sigma^2)L(\beta,\sigma^2|\mbox{data}), \] does not belong to any known family of bivariate distributions and, consequently, there are not closed form answers to questions such as the following:
What is \(E(\beta|\mbox{data})\)?
What is \(Pr(\sigma>1|\mbox{data})\)?
The Gibbs sampler is a Markov Chain Monte Carlo (MCMC) scheme that samples from the distribution of interest (aka target distribution), in our case the posterior \(p(\beta,\sigma^2|\mbox{data})\), by iteratively sampling from the full conditional distributions: \[ p(\beta|\sigma^2,\mbox{data}) \ \ \ \mbox{and} \ \ \ p(\sigma^2|\beta,\mbox{data}). \] The beauty of the Gibbs sampler, and therefore of all MCMC schemes, is that it essentially breaks down the complexicity of a given model into the complexities of smaller models where blocks of parameters are kept fixed.
In this full conditional distribution, the parameter \(\beta\) is treated as known. Therefore, we could define \(z_i = y_i-\beta x_i\) and the model becomes \[ z_1,\ldots,z_n \ \ iid \ \ N(0,\sigma^2). \] In this case, \[ p(\sigma^2|z_1,\ldots,z_n) \propto (\sigma^2)^{-(c_0+1)}\exp\left\{-\frac{d_0}{\sigma^2}\right\} (\sigma^2)^{-n/2}\exp\left\{-\frac{\sum_{i=1}^n z_i^2/2}{\sigma^2}\right\}, \] which resembles an inverse gamma distribution with hyperparameters \[ c_1 = c_0+\frac{n}{2} \ \ \ \mbox{and} \ \ \ d_1 = d_0 + \frac{\sum_{i=1}^n z_i^2}{2} = d_0 + \frac{\sum_{i=1}^n (y_i-\beta x_i)^2}{2}, \] or \(\sigma^2|\beta,\mbox{data} \sim IG(c_1,d_1)\).
sample.sigma2 = function(beta,c0,d0,y,x){
c1 = c0+n/2
d1 = d0+sum((y-x*beta)^2)/2
return(1/rgamma(1,c1,d1))
}
Similarly, assuming that \(\sigma^2\) is known, we have a simpler model,i.e. a Gaussian regression model with an unknown slope, \(\beta\), but known residual variance, \(\sigma^2\). In this case, \[\begin{eqnarray*} p(\beta|\sigma^2,\mbox{data}) &\propto& \exp\{-0.5(\beta-b_0)^2/V_0\}\exp\left\{-0.5\sum_{i=1}^n (y_i-\beta x_i)^2/\sigma^2\right\}\\ &\propto& \exp\left\{-0.5\left[\beta^2/V_0-2\beta b_0/V_0+\beta^2s_x^2/\sigma^2 -2\beta s_{xy}/\sigma^2\right]\right\}\\ &\propto& \exp\left\{-0.5\left[\beta^2(1/V_0+s_x^2/\sigma^2)-2\beta(b_0/V_0+s_{xy}/\sigma^2) \right]\right\}, \end{eqnarray*}\] where \(s_x^2 = \sum_{i=1}^n x_i^2\) and \(s_{xy} = \sum_{i=1}^n y_i x_i\). It is now easy to see that \[ \beta|\sigma^2,\mbox{data} \sim N(b_1,V_1), \] where \(V_1=1/(1/V_0+s_x^2/\sigma^2)\) and \[ b_1 = V_1(b_0/V_0+s_{xy}/\sigma^2) \]
sample.beta = function(sigma2,b0,V0,sx2,sxy){
V1 = 1/(1/V0+sx2/sigma2)
b1 = V1*(b0/V0+sxy/sigma2)
return(rnorm(1,b1,sqrt(V1)))
}
n = 50
x = runif(n,0,3)
y = 0.75*x+rnorm(n)
plot(x,y)
sx2 = sum(x^2)
sxy = sum(x*y)
b0 = 0
V0 = 9
c0 = 1/2
d0 = 1/2
par(mfrow=c(1,2))
sig2 = seq(0.0001,10,length=1000)
plot(sig2,dgamma(1/sig2,c0,d0)/(sig2^2),xlab=expression(sigma^2),ylab="Prior density",type="l")
beta = seq(-10,10,length=1000)
plot(beta,dnorm(beta,b0,sqrt(V0)),xlab=expression(beta),ylab="Prior density",type="l")
beta.initial = 0
n.burn = 1000
n.draws = 1000
step = 1
niter = n.burn+n.draws*step
beta = beta.initial
draws = matrix(0,niter,2)
for (iter in 1:niter){
sigma2 = sample.sigma2(beta,c0,d0,y,x)
beta = sample.beta(sigma2,b0,V0,sx2,sxy)
draws[iter,] = c(beta,sqrt(sigma2))
}
par(mfrow=c(2,3))
ts.plot(draws[,1],ylab="",main=expression(beta),xlab="Iterations")
acf(draws[,1],main="")
hist(draws[,1],prob=TRUE,main="",xlab=expression(beta))
ts.plot(draws[,2],ylab="",main=expression(sigma),xlab="Iterations")
acf(draws[,2],main="")
hist(draws[,2],prob=TRUE,main="",xlab=expression(sigma))
beta = seq(-10,10,length=1000)
sig2 = seq(0.0001,10,length=1000)
par(mfrow=c(1,2))
plot(density(draws[,1]),xlab=expression(beta),main="",ylab="Density",lwd=2)
lines(beta,dnorm(beta,b0,sqrt(V0)),col=2,lwd=2)
abline(v=0.75,col=4,lwd=2)
plot(density(draws[,2]^2),xlab=expression(sigma^2),main="",ylab="Density",xlim=c(0,4),lwd=2)
lines(sig2,dgamma(1/sig2,c0,d0)/(sig2^2),col=2,lwd=2)
abline(v=1,col=4,lwd=2)
legend("topright",legend=c("Prior","Posterior","True value"),col=c(2,1,4),bty="n",lty=1,lwd=2)
mean(draws[,1])
## [1] 0.6707462
mean(draws[,2]>1)
## [1] 0.707