Data

I have randomly picked 40 basketball players (20 from the WNBA and 20 from the NBA) from the 2014-2015 season, all players are center (the tallest ones). We have recorded their heights (in inches) and weights (in pounds). The data is summarized below where \(x\) is height, y is weight and \(z\) is a dummy variable, such that \(z=0\) for NBA players and \(z=1\) for WNBA players. We standardized the data for convenience.

y = c(210,257,245,235,250,250,228,240,235,270,235,250,270,265,245,260,250,275,228,270,
      198,202,210,200,198,175,215,145,205,203,182,185,230,210,199,210,205,180,250,215)
x = c(82,84,83,81,83,85,82,84,83,83,82,85,83,84,81,84,84,84,82,82,
      77,75,78,78,76,71,77,77,76,77,76,76,76,77,80,76,76,76,76,80)
z = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
y = (y-mean(y))/sd(y)
x = (x-mean(x))/sd(x)
n = length(y)

Let us perform simple exploratory data analysis:

all = rbind(summary(y),
summary(x))
rownames(all) = c("Weight","Height")
round(all,1)

##        Min. 1st Qu. Median Mean 3rd Qu. Max.
## Weight -2.6    -0.7    0.1    0     0.8  1.6
## Height -2.4    -1.0    0.2    0     0.9  1.4

Now for NBA players

nba = rbind(summary(y[z==0]),
summary(x[z==0]))
rownames(all) = c("Weight","Height")
round(nba,1)

##      Min. 1st Qu. Median Mean 3rd Qu. Max.
## [1,] -0.5     0.3    0.8  0.8     1.2  1.6
## [2,]  0.3     0.6    0.9  0.9     1.2  1.4

Now for WNBA players

wnba = rbind(summary(y[z==1]),
summary(x[z==1]))
rownames(all) = c("Weight","Height")
round(wnba,1)

##      Min. 1st Qu. Median Mean 3rd Qu. Max.
## [1,] -2.6      -1   -0.7 -0.8    -0.5  0.8
## [2,] -2.4      -1   -1.0 -0.9    -0.8  0.1

How about some plots?

plot(x,y,col=2+z,pch=16,xlab="Heights (standardized)",ylab="Weights (standardized)")
legend("topleft",legend=c("NBA","WNBA"),col=2:3,pch=16,bty="n")
abline(lm(y~x),lwd=3)
abline(lm(y[z==0]~x[z==0]),lwd=3,col=2)
abline(lm(y[z==1]~x[z==1]),lwd=3,col=3)

OLS regression:

summary(lm(y~x+z))

## 
## Call:
## lm(formula = y ~ x + z)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.86650 -0.33088 -0.00557  0.25042  1.66045 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.3952     0.2489   1.588   0.1209  
## x             0.4214     0.2322   1.815   0.0777 .
## z            -0.7905     0.4586  -1.724   0.0931 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.613 on 37 degrees of freedom
## Multiple R-squared:  0.6435, Adjusted R-squared:  0.6242 
## F-statistic: 33.39 on 2 and 37 DF,  p-value: 5.162e-09

Model

We will assume that the data \(\{(y_i,x_i,z_i), i=1,\ldots,n\}\) form a random sample from a simple linear regression model: \[ y_i|x_i,z_i,\mu,\beta,\gamma,\sigma \sim N(\mu+\beta x_i + \gamma z_i, \sigma^2), \] where the variable \(z_i\) is a dummy variables taking values \(0\) or \(1\), so \(\gamma\) is the intercept shift for when \(z_i=1\).

Likelihood function

The likelihood function of \((\mu,\beta,\gamma,\sigma^2)\) is given by \[ L(\mu,\beta,\gamma,\sigma^2|\mbox{data}) \propto (\sigma^2)^{-n/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2\right\}, \]

Prior information

The prior for the vector \(\theta\) is such that \[ p(\mu,\beta,\gamma,\sigma^2) = p(\mu)p(\beta)p(\gamma)p(\sigma^2), \] where \(\mu,\beta,\gamma \in (-\infty,\infty)\) and \(\sigma^2>0\). For illustration, we will use the following marginal priors: \[\begin{eqnarray*} \mu &\sim& t_4(0,10)\\ \beta &\sim& Laplace(0,10)\\ \gamma &\sim& N(0,20)\\ \sigma^2 &\sim& Gamma(1/2,1/2) \end{eqnarray*}\] such that \[\begin{eqnarray*} p(\mu) &\propto& \left(1+\frac{\mu^2}{40}\right)^{-2.5}\\ p(\beta) &\propto& \exp\{-0.05|\beta|\}\\ p(\gamma) &\propto& \exp\{-0.025\gamma^2\}\\ p(\sigma^2) &\propto& (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}. \end{eqnarray*}\]

prior.mu     = function(mu){(1+mu^2/40)^(-2.5)}
prior.beta   = function(beta){exp(-0.05*abs(beta))}
prior.gamma  = function(gamma){exp(-0.025*gamma^2)}
prior.sigma2 = function(sigma2){(sigma2)^(-1/2)*exp(-0.5*sigma2)}
par(mfrow=c(2,2))
curve(prior.mu,from=-25,to=25,n=1000,xlab="",ylab="Density",main="Prior of mu")
curve(prior.beta,from=-100,to=100,n=1000,xlab="",ylab="Density",main="Prior of beta")
curve(prior.gamma,from=-25,to=25,n=1000,xlab="",ylab="Density",main="Prior of gamma")
curve(prior.sigma2,from=0.001,to=1,n=1000,xlab="",ylab="Density",main="Prior of sigma2")

Posterior inference

As usual, we Bayesians combine the various sources of information into the posterior distribution: \[ p(\mu,\beta,\gamma,\sigma^2|\mbox{data}) \propto p(\mu,\beta,\gamma,\sigma^2) L(\mu,\beta,\gamma,\sigma^2|\mbox{data}). \] This posterior has no known form and Monte Carlo approximations will be our approach. Notice that the parameter vector has dimension \(p=4\) and the sampling importance resampling (SIR) algorithm might start to becoming difficult to implement because of the need for good proposal/candidate distributions. The good news is that since late 1980s and early 1990, we can use Markov chain Monte Carlo (MCMC) algorithms such as the Gibbs sampler and the Metropolis-Hastings algorithm.

An MCMC algorithm for our posterior

We will draw from the posterior \(p(\theta|\mbox{data})\) by iteratively sampling the components of \(\theta\) from their full conditional distributions. More especifically, we start from an initial value \((\mu,\beta,\gamma,\sigma)^{(0)}\) and sample from the four full conditionals (the order does not matter) for \(i=1,\dots,M\),

Sample \(\mu^{(i)} \sim p(\mu|\beta^{(i-1)},\gamma^{(i-1)},\sigma^{2(i-1)},\mbox{data}) \propto p(\mu)L(\mu,\beta^{(i-1)},\gamma^{(i-1)},\sigma^{2(i-1)}|\mbox{data})\)
Sample \(\beta^{(i)} \sim p(\beta|\mu^{(i)},\gamma^{(i-1)},\sigma^{2(i-1)},\mbox{data}) \propto p(\beta)L(\mu^{(i)},\beta,\gamma^{(i-1)},\sigma^{2(i-1)}|\mbox{data})\)
Sample \(\gamma^{(i)} \sim p(\gamma|\mu^{(i)},\beta^{(i)},\sigma^{2(i-1)},\mbox{data}) \propto p(\gamma)L(\mu^{(i)},\beta^{(i)},\gamma,\sigma^{2(i-1)}|\mbox{data})\)
Sample \(\sigma^{2(i)} \sim p(\sigma^2|\mu^{(i)},\beta^{(i)},\gamma^{(i)},\mbox{data}) \propto p(\sigma^2)L(\mu^{(i)},\beta^{(i)},\gamma^{(i)},\sigma^2|\mbox{data})\)

In the end, after a burn-in period, say \(M_0\) draws, then \[ \left\{(\mu,\beta,\gamma,\sigma^2)^{(M_0+1)},\ldots,(\mu,\beta,\gamma,\sigma^2)^{(M)}\right\} \sim p(\mu,\beta,\gamma,\sigma^2|\mbox{data}). \] From there we proceed exactly as we did with the SIR algorithm.

Full conditional distributions

Full conditional of \(\mu\)

The conditional likelihood function for \(\mu\), conditional on \((\beta,\gamma,\sigma^2)\), is essentially the kernel of a Gaussian distribution with mean \({\bar y}^*_n\) and variance \(\sigma^2/n\), where \[ {\bar y}^*_n = \frac{1}{n} \sum_{i=1}^n (y_i-\beta x_i - \gamma z_i) = {\bar y}_n - \beta {\bar x}_n-\gamma {\bar z}_n. \] Therefore, \[ p(\mu|\beta,\gamma,\sigma^2,\mbox{data}) \propto \left(1+0.025\mu^2\right)^{-2.5}p_N(\mu;{\bar y}_n - \beta {\bar x}_n-\gamma {\bar z}_n,\sigma^2/n), \]

Full conditional of \(\beta\)

Similarly, \[ p(\beta|\mu,\gamma,\sigma^2,\mbox{data}) \propto \exp\{-0.05|\beta|\}p_N\left(\beta;\frac{s_{yx}-\mu n{\bar x}_n-\gamma s_{xz}}{s_x^2} ,\frac{\sigma^2}{s_x^2}\right), \] where \(s_{yx}=\sum_{i=1}^n y_ix_i\), \(s_{xz}=\sum_{i=1}^n x_iz_i\) and \(s_x^2=\sum_{i=1}^n x_i^2\).

Full conditional of \(\gamma\)

Similarly, \[ p(\gamma|\mu,\beta,\sigma,\mbox{data}) \propto \exp\{-0.025\gamma^2\}p_N\left(\gamma;\frac{s_{yz}-\mu n{\bar z}_n-\beta s_{xz}}{s_z^2},\frac{\sigma^2}{s_z^2}\right), \] where \(s_{yz}=\sum_{i=1}^n y_iz_i\) and \(s_z^2=\sum_{i=1}^n z_i^2\).

Full conditional of \(\sigma^2\)

The conditional likelihood function for \(\sigma^2\), conditional on \((\mu,\beta,\gamma)\), is essentially the kernel of an inverse gamma distribution \[ L(\sigma|\mu,\beta,\gamma,\mbox{data}) \propto (\sigma^2)^{-n/2} \exp\left\{-0.5 \frac{\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2}{\sigma^2}\right\}. \] Therefore, \[ p(\sigma^2|\mu,\beta,\gamma,\mbox{data}) \propto (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}p_{IG}\left(\sigma^2;(n-2)/2,\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2\right). \]

Gamma and Inverse Gamma distributions

When \(\sigma^2 \sim IG(a,b)\), its p.d.f. is given by \[ p(\sigma^2) = \frac{b^a}{\Gamma(a)} (\sigma^2)^{a-1}\exp\{-b/\sigma^2\} \qquad \sigma^2,a,b>0. \] In addition, \(E(\sigma^2) = b/(a-1)\) if \(a>\), Mode\((\sigma^2) = b/(a+1)\), and \(V(\sigma^2) = b^2/((a-1)^2(a-2)\) if \(a>2\).
When \(\phi \sim G(a,b)\), its p.d.f. is given by \[ p(\phi) = \frac{b^a}{\Gamma(a)} \phi^a\exp\{-b\phi\} \qquad x,a,b>0. \] In addition, \(E(\phi) = a/b\), Mode\((\phi) = (a-1)/b\) if \(a>1\), and \(V(\phi) = a/b^2\).
If \(\sigma^2 \sim IG(a,b)\), then \(\phi=1/\sigma^2 \sim G(a,b)\). It follows that \(1/E(\sigma^2)=(a-1)/b \neq a/b = E(1/\sigma^2)\). This is a general result in probability: In general, \(E(g(X)) \neq g(E(X))\).

The MCMC algorithm

The Markov chain Monte Carlo algorithm runs as follows:

Initial value: \(\theta^{(0)} = (\mu^{(0)},\beta^{(0)},\gamma^{(0)},\sigma^{2(0)})\)
For \(j=1,\ldots,M_0+M\)

2.a) Sample \(\mu^{(j)}\) from \(\pi(\mu) \equiv \left(1+0.025\mu^2\right)^{-2.5}p_N(\mu;{\bar y}_n - \beta^{(j-1)} {\bar x}_n-\gamma^{(j-1)} {\bar z}_n,\sigma^{2(j-1)}/n)\)
2.b) Sample \(\beta^{(j)}\) from \(\pi(\beta) \equiv \exp\{-0.05|\beta|\}p_N(\beta;(s_{yx}-\mu^{(j)} n{\bar x}_n-\gamma^{(j-1)} s_{xz})/s_x^2,\sigma^{2(j-1)}/s_x^2)\)
2.c) Sample \(\gamma^{(j)}\) from \(\pi(\gamma) \equiv \exp\{-0.025\gamma^2\}p_N(\gamma;(s_{zy}-\mu^{(j)} n{\bar z}_n-\beta^{(j)} s_{xz})/s_z^2,\sigma^{2(j-1)}/s_z^2)\)
2.d) Sample \(\sigma^{2(j)}\) from \(\pi(\sigma^2) \equiv (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}p_{IG}(\sigma^2;(n-2)/2,\sum_{i=1}^n (y_i-\mu^{(j)}-\beta^{(j)} x_i-\gamma^{(j)} z_i)^2)\)

Keep \(\theta^{(M_0+1)},\ldots,\theta^{(M_0+M)}\) as draws from the posterior \(p(\theta|\mbox{data})\).

Gibbs sampler

If all full conditionals are of known form, we usually can this algorithm the Gibbs Sampler, otherwise it is a generic Markov chain Monte Carlo (MCMC) algorithm. That would had been the case, had the priors of \((\mu,\beta,\gamma)\) and \(\sigma^2\) been trivariate Gaussian and Inverse-Gamma, respectively. The question now is: how to sample from the full conditionals \(\pi(\mu)\), \(\pi(\beta)\), \(\pi(\gamma)\) and \(\pi(\sigma^2)\)?

The random-walk Metropolis algorithm

A random-walk proposal to sample from, say \(\pi(\mu)\), first draws \(\mu^*\) from a proposal density \(q\) that depends on the current value of the chain, \(\mu^{(j-1)}\), and the proposed value, \(\mu^*\), through their difference, i.e. \(q(\mu^*|\mu^{(j-1)}) \equiv q(|\mu^*-\mu^{(j-1)}|)\). In this case, the acceptance probability is \[ \alpha(\mu^{(j-1)},\mu^*) = \min\left\{1,\frac{\pi(\mu^*)}{\pi(\mu^{(j-1)})}\right\}. \] If \(\mu^*\) is accepted, then \(\mu^{(j)}=\mu^*\), otherwise \(\mu^{(j)}=\mu^{(j-1)}\). The same approach can be applied to both \(\beta\) and \(\gamma\) parameters.

Common choice: Gaussian or Student’s \(t\)

A common choice of random-walk proposal is the normal or Student’s \(t\) with variance \(\tau_\mu^2\) (tuning variance), i.e. \[ \mu^*|\mu^{(j-1)} \sim N(\mu^{(j-1)},\tau_\mu^2) \ \ \ \ \mbox{or} \ \ \ \ \mu^*|\mu^{(j-1)} \sim t_\nu(\mu^{(j-1)},\tau_\mu^2), \] for some small value of \(\nu\), say \(\nu=1\) or \(\nu=2\). I would tend to prefer the Student’s \(t\) distribution since it has heavier tails than the Gaussian and, consequently, might propose \(\mu^*\)s at a wider range of possibility (this may be crucial to avoid the chain getting stuck in local modes). The choice of \(\tau_\mu\) determines how good is the mixing of the Markov chain. For \(\tau_\mu\) too small, most proposed moves will be tiny and, consequently, accepted. For \(\tau_\mu\) too large, most proposed moves will be large and hardly accepted and the chain will get stuck. A rule of thumb is to monitor the acceptance rate of the MCMC scheme and see if lies between 10% and 30% or something like that.

A generic Metropolis-Hastings algorithm

Let \(q(\sigma^{2*}|\sigma^{2(j-1)})\) be a generic proposal for the variance parameter \(\sigma^2\). The above acceptance probability is now written as \[ \alpha(\sigma^{2(j-1)},\sigma^{2*}) = \min\left\{1,\frac{\pi(\sigma^{2*})}{\pi(\sigma^{2(j-1)})} \times \frac{q(\sigma^{2(j-1)}|\sigma^{2*})}{q(\sigma^{2*}|\sigma^{2(j-1)})}\right\}. \] In the current example, a natural candidate distribution for \(\sigma^2\) is the \(IG((n-2)/2,\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2/2)\), which does not depend on \(\sigma^{(j-1)}\) (we usually call this an independent MH algorithm). Here, \((\mu,\beta,\gamma)\) are the most recent values of the chains for these three parameters. The acceptance probability of independent MH step based on the above \(IG\) proposal is given by: \[ \alpha(\sigma^{2(j-1)},\sigma^{2*}) = \min\left\{1,\frac{\pi(\sigma^{2*})}{\pi(\sigma^{2(j-1)})} \times \frac{q(\sigma^{2(j-1)})}{q(\sigma^{2*})}\right\}= \min\left\{1,\frac{(\sigma^{2*})^{-1/2}\exp\{-0.5 \sigma^{*2}\}}{(\sigma^{2(j-1)})^{-1/2}\exp\{-0.5 \sigma^{2(j-1)}\}}\right\}. \]

Back to the NBA/WNBA data

Let us implement the above MCMC scheme to learn about \((\mu,\beta,\gamma,\sigma^2)\) for the NBA/WNBA data.

Data summaries (sufficient statistics)

ybar = mean(y)
xbar = mean(x)
zbar = mean(z)
syx  = sum(y*x)
syz  = sum(y*z)
sxz  = sum(x*z)
sx2  = sum(x^2)
sz2  = sum(z^2)
c(ybar,xbar,zbar,syx,sxz,sx2,sz2)

## [1]  1.387779e-17  7.610481e-16  5.000000e-01  3.058177e+01 -1.789762e+01
## [6]  3.900000e+01  2.000000e+01

The full conditional functions

pi.mu = function(mu,beta,gamma,sigma2){
  (1+0.025*mu^2)^(-2.5)*dnorm(mu,ybar-beta*xbar-gamma*zbar,sqrt(sigma2/n))
}
pi.beta = function(mu,beta,gamma,sigma2){
  exp(-0.05*abs(beta))*dnorm(beta,(syx-mu*n*xbar-gamma*sxz)/sx2,sqrt(sigma2/sx2))
}
pi.gamma = function(mu,beta,gamma,sigma2){
  exp(-0.025*gamma^2)*dnorm(gamma,(syz-mu*n*zbar-beta*sxz)/sz2,sqrt(sigma2/sz2))
}
logpi.mu = function(mu,beta,gamma,sigma2){
  log((1+0.025*mu^2)^(-2.5)) + dnorm(mu,ybar-beta*xbar-gamma*zbar,sqrt(sigma2/n),log=TRUE)
}
logpi.beta = function(mu,beta,gamma,sigma2){
  log(exp(-0.05*abs(beta))) + dnorm(beta,(syx-mu*n*xbar-gamma*sxz)/sx2,sqrt(sigma2/sx2),log=TRUE)
}
logpi.gamma = function(mu,beta,gamma,sigma2){
  log(exp(-0.025*gamma^2)) + dnorm(gamma,(syz-mu*n*zbar-beta*sxz)/sz2,sqrt(sigma2/sz2),log=TRUE)
}

The actual MCMC algorithm

ols.fit = lm(y~x+z)
ols.coef = ols.fit$coef
mu = ols.coef[1]
beta = ols.coef[2]
gamma = ols.coef[3]
sigma2 = mean((y-ols.fit$fit)^2)
c(mu,beta,gamma,sqrt(sigma2))

## (Intercept)           x           z             
##   0.3952308   0.4213948  -0.7904616   0.5895565

# proposal parameters
nu = 2
tau.mu = 1
tau.beta = 0.1
tau.gamma = 1

# MCMC set up
M0 = 10000
M  = 10000
L  = 1
niter = M0+L*M
draws = matrix(0,niter,4)
# MCMC algorithm
for (j in 1:niter){
  # sampling mu
  mu.star = rnorm(1,mu,tau.mu)
  acceptance = min(0,logpi.mu(mu.star,beta,gamma,sigma2)-logpi.mu(mu,beta,gamma,sigma2))
  if (log(runif(1))<acceptance){
    mu = mu.star
  }
  # sampling beta
  beta.star = rnorm(1,beta,tau.beta)
  acceptance = min(0,logpi.beta(mu,beta.star,gamma,sigma2)-logpi.beta(mu,beta,gamma,sigma2))
  if (log(runif(1))<acceptance){
    beta = beta.star
  }
  # sampling gamma
  gamma.star = rnorm(1,gamma,tau.gamma)
  acceptance = min(0,logpi.gamma(mu,beta,gamma.star,sigma2)-logpi.gamma(mu,beta,gamma,sigma2))
  if (log(runif(1))<acceptance){
    gamma = gamma.star
  }
  # sampling sigma2
  sigma2.star = 1/rgamma(1,(n-2)/2,sum((y-mu-beta*x-gamma*z)^2)/2)
  acceptance = min(1,(sigma2.star/sigma2)^(-1/2)*exp(-0.5*(sigma2.star-sigma2)))
  if (runif(1)<acceptance){
    sigma2 = sigma2.star
  }
  # Storing the draws
  draws[j,] = c(mu,beta,gamma,sigma2)
}
draws = draws[seq(M0+1,niter,by=L),]

names = c("mu","beta","gamma","sigma2")
par(mfrow=c(3,4))
for (i in 1:4)
  ts.plot(draws[,i],xlab="Iterations",ylab="",main=names[i])
for (i in 1:4)
  acf(draws[,i],main="")
for (i in 1:4)
  hist(draws[,i],prob=TRUE,main="",xlab="")

Block sampling

It becomes obvious from the previous MCMC algorithm that one could have(maybe should have!) considered sampling \((\mu,\beta,\gamma)\) jointly. In this case, the likelihood is better explained in matricial format: \[ L(\mu,\beta,\gamma|\sigma^2,\mbox{data}) \propto \exp\left\{-\frac{1}{\sigma^2} (y-X\delta)'(y-X\delta)\right\} \propto \exp\left\{-\frac{1}{\sigma^2}(\delta-{\hat \delta})X'X(\delta-{\hat \delta})\right\}, \] which resembles a trivariate Gaussian with mean vector \({\hat \delta}=(X'X)^{-1}X'y\) and covariance matrix \(\sigma^2(X'X)^{-1}\), where \(\delta=(\mu,\beta,\gamma)\). Below we will implement two algorithms:

OLS-based independent MH proposal: \(q_{ols}(\delta) \equiv N({\hat \delta},\sigma^2 (X'X)^{-1})\) \[ \alpha = \mbox{min}\left\{1,\frac{p(\mu^*)p(\beta^*)p(\gamma^*)}{p(\mu^{(i-1)})p(\beta^{(i-1)})p(\gamma^{(i-1)})}\right\} \]
Random-walk MH proposal: \(q_{rw}(\delta) \equiv N(\delta^{(i-1)},V)\), where \(V=\mbox{diag}(\tau_\mu^2,\tau_\beta^2,\tau_\gamma^2)\). \[ \alpha = \mbox{min}\left\{1,\frac{p(\mu^*)p(\beta^*)p(\gamma^*)L(\mu^*,\beta^*,\gamma^*,\sigma^2|\mbox{data})} {p(\mu^{(i-1)})p(\beta^{(i-1)})p(\gamma^{(i-1)})L(\mu^{(i-1)},\beta^{(i-1)},\gamma^{(i-1)},\sigma^2|\mbox{data})}\right\} \]

\[ \alpha = \mbox{min}\left\{1,\frac{p(\delta^*)L(\delta^*,\sigma^2|\mbox{data})}{ p(\delta^{(i-1)})L(\delta^{(i-1)},\sigma^2|\mbox{data})}\right\} \]

OLS-based independent MH proposal

X      = cbind(1,x,z)
iXtX   = solve(t(X)%*%X)
ciXtX  = t(chol(iXtX))
deltahat = iXtX%*%t(X)%*%y
sigma2hat = mean((y-X%*%deltahat)^2)
delta  = deltahat
sigma2 = sigma2hat
draws1 = matrix(0,niter,4)
# MCMC algorithm
for (j in 1:niter){
  # sampling (mu,beta,gamma)
  deltastar = deltahat + sqrt(sigma2)*ciXtX%*%rnorm(3)
  A = prior.mu(deltastar[1])*prior.beta(deltastar[2])*prior.gamma(deltastar[3])
  B = prior.mu(delta[1])*prior.beta(delta[2])*prior.gamma(delta[3])
  acceptance = min(1,A/B)
  if (runif(1)<acceptance){
    delta = deltastar 
    mu = delta[1]
    beta = delta[2]
    gamma = delta[3]
  }  
  # sampling sigma2
  sigma2.star = 1/rgamma(1,(n-2)/2,sum((y-mu-beta*x-gamma*z)^2)/2)
  acceptance = min(1,(sigma2.star/sigma2)^(-1/2)*exp(-0.5*(sigma2.star-sigma2)))
  if (runif(1)<acceptance){
    sigma2 = sigma2.star
  }
  # Storing the draws
  draws1[j,] = c(mu,beta,gamma,sigma2)
}
draws1 = draws1[seq(M0+1,niter,by=L),]

The results are below:

names = c("mu","beta","gamma","sigma2")
par(mfrow=c(3,4))
for (i in 1:4)
  ts.plot(draws1[,i],xlab="Iterations",ylab="",main=names[i])
for (i in 1:4)
  acf(draws1[,i],main="")
for (i in 1:4)
  hist(draws1[,i],prob=TRUE,main="",xlab="")

Posterior dependences

par(mfrow=c(2,3))
plot(draws[,1],draws[,2],xlab=expression(mu),ylab=expression(beta))
points(deltahat[1],deltahat[2],pch=16,col=2,cex=2)
plot(draws[,1],draws[,3],xlab=expression(mu),ylab=expression(gamma))
points(deltahat[1],deltahat[3],pch=16,col=2,cex=2)
plot(draws[,2],draws[,3],xlab=expression(beta),ylab=expression(gamma))
points(deltahat[2],deltahat[3],pch=16,col=2,cex=2)
plot(draws[,1],draws[,4],xlab=expression(mu),ylab=expression(sigma^2))
points(deltahat[1],sigma2hat,pch=16,col=2,cex=2)
plot(draws[,2],draws[,4],xlab=expression(beta),ylab=expression(sigma^2))
points(deltahat[2],sigma2hat,pch=16,col=2,cex=2)
plot(draws[,3],draws[,4],xlab=expression(gamma),ylab=expression(sigma^2))
points(deltahat[3],sigma2hat,pch=16,col=2,cex=2)

Comparing the results

# mu
tab = round(rbind(summary(draws[,1]),summary(draws1[,1])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab

##         Min. 1st Qu. Median  Mean 3rd Qu.  Max.
## MCMC1 -0.431   0.192  0.396 0.371   0.540 1.092
## MCMC2 -0.608   0.223  0.396 0.394   0.564 1.643

# beta
tab = round(rbind(summary(draws[,2]),summary(draws1[,2])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab

##         Min. 1st Qu. Median  Mean 3rd Qu.  Max.
## MCMC1 -0.307   0.278  0.428 0.443   0.608 1.229
## MCMC2 -0.692   0.261  0.424 0.421   0.578 1.461

# gamma
tab = round(rbind(summary(draws[,3]),summary(draws1[,3])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab

##         Min. 1st Qu. Median   Mean 3rd Qu.  Max.
## MCMC1 -2.077  -1.056 -0.789 -0.743   -0.40 0.811
## MCMC2 -2.658  -1.104 -0.791 -0.790   -0.48 1.126

# sigma
tab = round(rbind(summary(sqrt(draws[,4])),summary(sqrt(draws1[,4]))),3)
rownames(tab) = c("MCMC1","MCMC2")
tab

##        Min. 1st Qu. Median  Mean 3rd Qu.  Max.
## MCMC1 0.409   0.577  0.623 0.630   0.676 1.121
## MCMC2 0.430   0.577  0.625 0.631   0.675 1.006

Bayesian linear regression

Our first MCMC algorithm

Hedibert Freitas Lopes

6/1/2021