I have randomly picked 40 basketball players (20 from the WNBA and 20 from the NBA) from the 2014-2015 season, all players are center (the tallest ones). We have recorded their heights (in inches) and weights (in pounds). The data is summarized below where \(x\) is height, y is weight and \(z\) is a dummy variable, such that \(z=0\) for NBA players and \(z=1\) for WNBA players. We standardized the data for convenience.
y = c(210,257,245,235,250,250,228,240,235,270,235,250,270,265,245,260,250,275,228,270,
198,202,210,200,198,175,215,145,205,203,182,185,230,210,199,210,205,180,250,215)
x = c(82,84,83,81,83,85,82,84,83,83,82,85,83,84,81,84,84,84,82,82,
77,75,78,78,76,71,77,77,76,77,76,76,76,77,80,76,76,76,76,80)
z = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
y = (y-mean(y))/sd(y)
x = (x-mean(x))/sd(x)
n = length(y)
Let us perform simple exploratory data analysis:
all = rbind(summary(y),
summary(x))
rownames(all) = c("Weight","Height")
round(all,1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Weight -2.6 -0.7 0.1 0 0.8 1.6
## Height -2.4 -1.0 0.2 0 0.9 1.4
Now for NBA players
nba = rbind(summary(y[z==0]),
summary(x[z==0]))
rownames(all) = c("Weight","Height")
round(nba,1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## [1,] -0.5 0.3 0.8 0.8 1.2 1.6
## [2,] 0.3 0.6 0.9 0.9 1.2 1.4
Now for WNBA players
wnba = rbind(summary(y[z==1]),
summary(x[z==1]))
rownames(all) = c("Weight","Height")
round(wnba,1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## [1,] -2.6 -1 -0.7 -0.8 -0.5 0.8
## [2,] -2.4 -1 -1.0 -0.9 -0.8 0.1
How about some plots?
plot(x,y,col=2+z,pch=16,xlab="Heights (standardized)",ylab="Weights (standardized)")
legend("topleft",legend=c("NBA","WNBA"),col=2:3,pch=16,bty="n")
abline(lm(y~x),lwd=3)
abline(lm(y[z==0]~x[z==0]),lwd=3,col=2)
abline(lm(y[z==1]~x[z==1]),lwd=3,col=3)
OLS regression:
summary(lm(y~x+z))
##
## Call:
## lm(formula = y ~ x + z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.86650 -0.33088 -0.00557 0.25042 1.66045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3952 0.2489 1.588 0.1209
## x 0.4214 0.2322 1.815 0.0777 .
## z -0.7905 0.4586 -1.724 0.0931 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.613 on 37 degrees of freedom
## Multiple R-squared: 0.6435, Adjusted R-squared: 0.6242
## F-statistic: 33.39 on 2 and 37 DF, p-value: 5.162e-09
We will assume that the data \(\{(y_i,x_i,z_i), i=1,\ldots,n\}\) form a random sample from a simple linear regression model: \[ y_i|x_i,z_i,\mu,\beta,\gamma,\sigma \sim N(\mu+\beta x_i + \gamma z_i, \sigma^2), \] where the variable \(z_i\) is a dummy variables taking values \(0\) or \(1\), so \(\gamma\) is the intercept shift for when \(z_i=1\).
The likelihood function of \((\mu,\beta,\gamma,\sigma^2)\) is given by \[ L(\mu,\beta,\gamma,\sigma^2|\mbox{data}) \propto (\sigma^2)^{-n/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2\right\}, \]
The prior for the vector \(\theta\) is such that \[ p(\mu,\beta,\gamma,\sigma^2) = p(\mu)p(\beta)p(\gamma)p(\sigma^2), \] where \(\mu,\beta,\gamma \in (-\infty,\infty)\) and \(\sigma^2>0\). For illustration, we will use the following marginal priors: \[\begin{eqnarray*} \mu &\sim& t_4(0,10)\\ \beta &\sim& Laplace(0,10)\\ \gamma &\sim& N(0,20)\\ \sigma^2 &\sim& Gamma(1/2,1/2) \end{eqnarray*}\] such that \[\begin{eqnarray*} p(\mu) &\propto& \left(1+\frac{\mu^2}{40}\right)^{-2.5}\\ p(\beta) &\propto& \exp\{-0.05|\beta|\}\\ p(\gamma) &\propto& \exp\{-0.025\gamma^2\}\\ p(\sigma^2) &\propto& (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}. \end{eqnarray*}\]
prior.mu = function(mu){(1+mu^2/40)^(-2.5)}
prior.beta = function(beta){exp(-0.05*abs(beta))}
prior.gamma = function(gamma){exp(-0.025*gamma^2)}
prior.sigma2 = function(sigma2){(sigma2)^(-1/2)*exp(-0.5*sigma2)}
par(mfrow=c(2,2))
curve(prior.mu,from=-25,to=25,n=1000,xlab="",ylab="Density",main="Prior of mu")
curve(prior.beta,from=-100,to=100,n=1000,xlab="",ylab="Density",main="Prior of beta")
curve(prior.gamma,from=-25,to=25,n=1000,xlab="",ylab="Density",main="Prior of gamma")
curve(prior.sigma2,from=0.001,to=1,n=1000,xlab="",ylab="Density",main="Prior of sigma2")
As usual, we Bayesians combine the various sources of information into the posterior distribution: \[ p(\mu,\beta,\gamma,\sigma^2|\mbox{data}) \propto p(\mu,\beta,\gamma,\sigma^2) L(\mu,\beta,\gamma,\sigma^2|\mbox{data}). \] This posterior has no known form and Monte Carlo approximations will be our approach. Notice that the parameter vector has dimension \(p=4\) and the sampling importance resampling (SIR) algorithm might start to becoming difficult to implement because of the need for good proposal/candidate distributions. The good news is that since late 1980s and early 1990, we can use Markov chain Monte Carlo (MCMC) algorithms such as the Gibbs sampler and the Metropolis-Hastings algorithm.
We will draw from the posterior \(p(\theta|\mbox{data})\) by iteratively sampling the components of \(\theta\) from their full conditional distributions. More especifically, we start from an initial value \((\mu,\beta,\gamma,\sigma)^{(0)}\) and sample from the four full conditionals (the order does not matter) for \(i=1,\dots,M\),
In the end, after a burn-in period, say \(M_0\) draws, then \[ \left\{(\mu,\beta,\gamma,\sigma^2)^{(M_0+1)},\ldots,(\mu,\beta,\gamma,\sigma^2)^{(M)}\right\} \sim p(\mu,\beta,\gamma,\sigma^2|\mbox{data}). \] From there we proceed exactly as we did with the SIR algorithm.
The conditional likelihood function for \(\mu\), conditional on \((\beta,\gamma,\sigma^2)\), is essentially the kernel of a Gaussian distribution with mean \({\bar y}^*_n\) and variance \(\sigma^2/n\), where \[ {\bar y}^*_n = \frac{1}{n} \sum_{i=1}^n (y_i-\beta x_i - \gamma z_i) = {\bar y}_n - \beta {\bar x}_n-\gamma {\bar z}_n. \] Therefore, \[ p(\mu|\beta,\gamma,\sigma^2,\mbox{data}) \propto \left(1+0.025\mu^2\right)^{-2.5}p_N(\mu;{\bar y}_n - \beta {\bar x}_n-\gamma {\bar z}_n,\sigma^2/n), \]
Similarly, \[ p(\beta|\mu,\gamma,\sigma^2,\mbox{data}) \propto \exp\{-0.05|\beta|\}p_N\left(\beta;\frac{s_{yx}-\mu n{\bar x}_n-\gamma s_{xz}}{s_x^2} ,\frac{\sigma^2}{s_x^2}\right), \] where \(s_{yx}=\sum_{i=1}^n y_ix_i\), \(s_{xz}=\sum_{i=1}^n x_iz_i\) and \(s_x^2=\sum_{i=1}^n x_i^2\).
Similarly, \[ p(\gamma|\mu,\beta,\sigma,\mbox{data}) \propto \exp\{-0.025\gamma^2\}p_N\left(\gamma;\frac{s_{yz}-\mu n{\bar z}_n-\beta s_{xz}}{s_z^2},\frac{\sigma^2}{s_z^2}\right), \] where \(s_{yz}=\sum_{i=1}^n y_iz_i\) and \(s_z^2=\sum_{i=1}^n z_i^2\).
The conditional likelihood function for \(\sigma^2\), conditional on \((\mu,\beta,\gamma)\), is essentially the kernel of an inverse gamma distribution \[ L(\sigma|\mu,\beta,\gamma,\mbox{data}) \propto (\sigma^2)^{-n/2} \exp\left\{-0.5 \frac{\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2}{\sigma^2}\right\}. \] Therefore, \[ p(\sigma^2|\mu,\beta,\gamma,\mbox{data}) \propto (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}p_{IG}\left(\sigma^2;(n-2)/2,\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2\right). \]
When \(\sigma^2 \sim IG(a,b)\), its p.d.f. is given by \[ p(\sigma^2) = \frac{b^a}{\Gamma(a)} (\sigma^2)^{a-1}\exp\{-b/\sigma^2\} \qquad \sigma^2,a,b>0. \] In addition, \(E(\sigma^2) = b/(a-1)\) if \(a>\), Mode\((\sigma^2) = b/(a+1)\), and \(V(\sigma^2) = b^2/((a-1)^2(a-2)\) if \(a>2\).
When \(\phi \sim G(a,b)\), its p.d.f. is given by \[ p(\phi) = \frac{b^a}{\Gamma(a)} \phi^a\exp\{-b\phi\} \qquad x,a,b>0. \] In addition, \(E(\phi) = a/b\), Mode\((\phi) = (a-1)/b\) if \(a>1\), and \(V(\phi) = a/b^2\).
If \(\sigma^2 \sim IG(a,b)\), then \(\phi=1/\sigma^2 \sim G(a,b)\). It follows that \(1/E(\sigma^2)=(a-1)/b \neq a/b = E(1/\sigma^2)\). This is a general result in probability: In general, \(E(g(X)) \neq g(E(X))\).
The Markov chain Monte Carlo algorithm runs as follows:
Initial value: \(\theta^{(0)} = (\mu^{(0)},\beta^{(0)},\gamma^{(0)},\sigma^{2(0)})\)
For \(j=1,\ldots,M_0+M\)
2.a) Sample \(\mu^{(j)}\) from \(\pi(\mu) \equiv \left(1+0.025\mu^2\right)^{-2.5}p_N(\mu;{\bar y}_n - \beta^{(j-1)} {\bar x}_n-\gamma^{(j-1)} {\bar z}_n,\sigma^{2(j-1)}/n)\)
2.b) Sample \(\beta^{(j)}\) from \(\pi(\beta) \equiv \exp\{-0.05|\beta|\}p_N(\beta;(s_{yx}-\mu^{(j)} n{\bar x}_n-\gamma^{(j-1)} s_{xz})/s_x^2,\sigma^{2(j-1)}/s_x^2)\)
2.c) Sample \(\gamma^{(j)}\) from \(\pi(\gamma) \equiv \exp\{-0.025\gamma^2\}p_N(\gamma;(s_{zy}-\mu^{(j)} n{\bar z}_n-\beta^{(j)} s_{xz})/s_z^2,\sigma^{2(j-1)}/s_z^2)\)
2.d) Sample \(\sigma^{2(j)}\) from \(\pi(\sigma^2) \equiv (\sigma^2)^{-1/2}\exp\{-0.5 \sigma^2\}p_{IG}(\sigma^2;(n-2)/2,\sum_{i=1}^n (y_i-\mu^{(j)}-\beta^{(j)} x_i-\gamma^{(j)} z_i)^2)\)
If all full conditionals are of known form, we usually can this algorithm the Gibbs Sampler, otherwise it is a generic Markov chain Monte Carlo (MCMC) algorithm. That would had been the case, had the priors of \((\mu,\beta,\gamma)\) and \(\sigma^2\) been trivariate Gaussian and Inverse-Gamma, respectively. The question now is: how to sample from the full conditionals \(\pi(\mu)\), \(\pi(\beta)\), \(\pi(\gamma)\) and \(\pi(\sigma^2)\)?
A random-walk proposal to sample from, say \(\pi(\mu)\), first draws \(\mu^*\) from a proposal density \(q\) that depends on the current value of the chain, \(\mu^{(j-1)}\), and the proposed value, \(\mu^*\), through their difference, i.e. \(q(\mu^*|\mu^{(j-1)}) \equiv q(|\mu^*-\mu^{(j-1)}|)\). In this case, the acceptance probability is \[ \alpha(\mu^{(j-1)},\mu^*) = \min\left\{1,\frac{\pi(\mu^*)}{\pi(\mu^{(j-1)})}\right\}. \] If \(\mu^*\) is accepted, then \(\mu^{(j)}=\mu^*\), otherwise \(\mu^{(j)}=\mu^{(j-1)}\). The same approach can be applied to both \(\beta\) and \(\gamma\) parameters.
A common choice of random-walk proposal is the normal or Student’s \(t\) with variance \(\tau_\mu^2\) (tuning variance), i.e. \[ \mu^*|\mu^{(j-1)} \sim N(\mu^{(j-1)},\tau_\mu^2) \ \ \ \ \mbox{or} \ \ \ \ \mu^*|\mu^{(j-1)} \sim t_\nu(\mu^{(j-1)},\tau_\mu^2), \] for some small value of \(\nu\), say \(\nu=1\) or \(\nu=2\). I would tend to prefer the Student’s \(t\) distribution since it has heavier tails than the Gaussian and, consequently, might propose \(\mu^*\)s at a wider range of possibility (this may be crucial to avoid the chain getting stuck in local modes). The choice of \(\tau_\mu\) determines how good is the mixing of the Markov chain. For \(\tau_\mu\) too small, most proposed moves will be tiny and, consequently, accepted. For \(\tau_\mu\) too large, most proposed moves will be large and hardly accepted and the chain will get stuck. A rule of thumb is to monitor the acceptance rate of the MCMC scheme and see if lies between 10% and 30% or something like that.
Let \(q(\sigma^{2*}|\sigma^{2(j-1)})\) be a generic proposal for the variance parameter \(\sigma^2\). The above acceptance probability is now written as \[ \alpha(\sigma^{2(j-1)},\sigma^{2*}) = \min\left\{1,\frac{\pi(\sigma^{2*})}{\pi(\sigma^{2(j-1)})} \times \frac{q(\sigma^{2(j-1)}|\sigma^{2*})}{q(\sigma^{2*}|\sigma^{2(j-1)})}\right\}. \] In the current example, a natural candidate distribution for \(\sigma^2\) is the \(IG((n-2)/2,\sum_{i=1}^n (y_i-\mu-\beta x_i-\gamma z_i)^2/2)\), which does not depend on \(\sigma^{(j-1)}\) (we usually call this an independent MH algorithm). Here, \((\mu,\beta,\gamma)\) are the most recent values of the chains for these three parameters. The acceptance probability of independent MH step based on the above \(IG\) proposal is given by: \[ \alpha(\sigma^{2(j-1)},\sigma^{2*}) = \min\left\{1,\frac{\pi(\sigma^{2*})}{\pi(\sigma^{2(j-1)})} \times \frac{q(\sigma^{2(j-1)})}{q(\sigma^{2*})}\right\}= \min\left\{1,\frac{(\sigma^{2*})^{-1/2}\exp\{-0.5 \sigma^{*2}\}}{(\sigma^{2(j-1)})^{-1/2}\exp\{-0.5 \sigma^{2(j-1)}\}}\right\}. \]
Let us implement the above MCMC scheme to learn about \((\mu,\beta,\gamma,\sigma^2)\) for the NBA/WNBA data.
ybar = mean(y)
xbar = mean(x)
zbar = mean(z)
syx = sum(y*x)
syz = sum(y*z)
sxz = sum(x*z)
sx2 = sum(x^2)
sz2 = sum(z^2)
c(ybar,xbar,zbar,syx,sxz,sx2,sz2)
## [1] 1.387779e-17 7.610481e-16 5.000000e-01 3.058177e+01 -1.789762e+01
## [6] 3.900000e+01 2.000000e+01
pi.mu = function(mu,beta,gamma,sigma2){
(1+0.025*mu^2)^(-2.5)*dnorm(mu,ybar-beta*xbar-gamma*zbar,sqrt(sigma2/n))
}
pi.beta = function(mu,beta,gamma,sigma2){
exp(-0.05*abs(beta))*dnorm(beta,(syx-mu*n*xbar-gamma*sxz)/sx2,sqrt(sigma2/sx2))
}
pi.gamma = function(mu,beta,gamma,sigma2){
exp(-0.025*gamma^2)*dnorm(gamma,(syz-mu*n*zbar-beta*sxz)/sz2,sqrt(sigma2/sz2))
}
logpi.mu = function(mu,beta,gamma,sigma2){
log((1+0.025*mu^2)^(-2.5)) + dnorm(mu,ybar-beta*xbar-gamma*zbar,sqrt(sigma2/n),log=TRUE)
}
logpi.beta = function(mu,beta,gamma,sigma2){
log(exp(-0.05*abs(beta))) + dnorm(beta,(syx-mu*n*xbar-gamma*sxz)/sx2,sqrt(sigma2/sx2),log=TRUE)
}
logpi.gamma = function(mu,beta,gamma,sigma2){
log(exp(-0.025*gamma^2)) + dnorm(gamma,(syz-mu*n*zbar-beta*sxz)/sz2,sqrt(sigma2/sz2),log=TRUE)
}
ols.fit = lm(y~x+z)
ols.coef = ols.fit$coef
mu = ols.coef[1]
beta = ols.coef[2]
gamma = ols.coef[3]
sigma2 = mean((y-ols.fit$fit)^2)
c(mu,beta,gamma,sqrt(sigma2))
## (Intercept) x z
## 0.3952308 0.4213948 -0.7904616 0.5895565
# proposal parameters
nu = 2
tau.mu = 1
tau.beta = 0.1
tau.gamma = 1
# MCMC set up
M0 = 10000
M = 10000
L = 1
niter = M0+L*M
draws = matrix(0,niter,4)
# MCMC algorithm
for (j in 1:niter){
# sampling mu
mu.star = rnorm(1,mu,tau.mu)
acceptance = min(0,logpi.mu(mu.star,beta,gamma,sigma2)-logpi.mu(mu,beta,gamma,sigma2))
if (log(runif(1))<acceptance){
mu = mu.star
}
# sampling beta
beta.star = rnorm(1,beta,tau.beta)
acceptance = min(0,logpi.beta(mu,beta.star,gamma,sigma2)-logpi.beta(mu,beta,gamma,sigma2))
if (log(runif(1))<acceptance){
beta = beta.star
}
# sampling gamma
gamma.star = rnorm(1,gamma,tau.gamma)
acceptance = min(0,logpi.gamma(mu,beta,gamma.star,sigma2)-logpi.gamma(mu,beta,gamma,sigma2))
if (log(runif(1))<acceptance){
gamma = gamma.star
}
# sampling sigma2
sigma2.star = 1/rgamma(1,(n-2)/2,sum((y-mu-beta*x-gamma*z)^2)/2)
acceptance = min(1,(sigma2.star/sigma2)^(-1/2)*exp(-0.5*(sigma2.star-sigma2)))
if (runif(1)<acceptance){
sigma2 = sigma2.star
}
# Storing the draws
draws[j,] = c(mu,beta,gamma,sigma2)
}
draws = draws[seq(M0+1,niter,by=L),]
names = c("mu","beta","gamma","sigma2")
par(mfrow=c(3,4))
for (i in 1:4)
ts.plot(draws[,i],xlab="Iterations",ylab="",main=names[i])
for (i in 1:4)
acf(draws[,i],main="")
for (i in 1:4)
hist(draws[,i],prob=TRUE,main="",xlab="")
It becomes obvious from the previous MCMC algorithm that one could have(maybe should have!) considered sampling \((\mu,\beta,\gamma)\) jointly. In this case, the likelihood is better explained in matricial format: \[ L(\mu,\beta,\gamma|\sigma^2,\mbox{data}) \propto \exp\left\{-\frac{1}{\sigma^2} (y-X\delta)'(y-X\delta)\right\} \propto \exp\left\{-\frac{1}{\sigma^2}(\delta-{\hat \delta})X'X(\delta-{\hat \delta})\right\}, \] which resembles a trivariate Gaussian with mean vector \({\hat \delta}=(X'X)^{-1}X'y\) and covariance matrix \(\sigma^2(X'X)^{-1}\), where \(\delta=(\mu,\beta,\gamma)\). Below we will implement two algorithms:
OLS-based independent MH proposal: \(q_{ols}(\delta) \equiv N({\hat \delta},\sigma^2 (X'X)^{-1})\) \[ \alpha = \mbox{min}\left\{1,\frac{p(\mu^*)p(\beta^*)p(\gamma^*)}{p(\mu^{(i-1)})p(\beta^{(i-1)})p(\gamma^{(i-1)})}\right\} \]
Random-walk MH proposal: \(q_{rw}(\delta) \equiv N(\delta^{(i-1)},V)\), where \(V=\mbox{diag}(\tau_\mu^2,\tau_\beta^2,\tau_\gamma^2)\). \[ \alpha = \mbox{min}\left\{1,\frac{p(\mu^*)p(\beta^*)p(\gamma^*)L(\mu^*,\beta^*,\gamma^*,\sigma^2|\mbox{data})} {p(\mu^{(i-1)})p(\beta^{(i-1)})p(\gamma^{(i-1)})L(\mu^{(i-1)},\beta^{(i-1)},\gamma^{(i-1)},\sigma^2|\mbox{data})}\right\} \]
\[ \alpha = \mbox{min}\left\{1,\frac{p(\delta^*)L(\delta^*,\sigma^2|\mbox{data})}{ p(\delta^{(i-1)})L(\delta^{(i-1)},\sigma^2|\mbox{data})}\right\} \]
X = cbind(1,x,z)
iXtX = solve(t(X)%*%X)
ciXtX = t(chol(iXtX))
deltahat = iXtX%*%t(X)%*%y
sigma2hat = mean((y-X%*%deltahat)^2)
delta = deltahat
sigma2 = sigma2hat
draws1 = matrix(0,niter,4)
# MCMC algorithm
for (j in 1:niter){
# sampling (mu,beta,gamma)
deltastar = deltahat + sqrt(sigma2)*ciXtX%*%rnorm(3)
A = prior.mu(deltastar[1])*prior.beta(deltastar[2])*prior.gamma(deltastar[3])
B = prior.mu(delta[1])*prior.beta(delta[2])*prior.gamma(delta[3])
acceptance = min(1,A/B)
if (runif(1)<acceptance){
delta = deltastar
mu = delta[1]
beta = delta[2]
gamma = delta[3]
}
# sampling sigma2
sigma2.star = 1/rgamma(1,(n-2)/2,sum((y-mu-beta*x-gamma*z)^2)/2)
acceptance = min(1,(sigma2.star/sigma2)^(-1/2)*exp(-0.5*(sigma2.star-sigma2)))
if (runif(1)<acceptance){
sigma2 = sigma2.star
}
# Storing the draws
draws1[j,] = c(mu,beta,gamma,sigma2)
}
draws1 = draws1[seq(M0+1,niter,by=L),]
The results are below:
names = c("mu","beta","gamma","sigma2")
par(mfrow=c(3,4))
for (i in 1:4)
ts.plot(draws1[,i],xlab="Iterations",ylab="",main=names[i])
for (i in 1:4)
acf(draws1[,i],main="")
for (i in 1:4)
hist(draws1[,i],prob=TRUE,main="",xlab="")
Posterior dependences
par(mfrow=c(2,3))
plot(draws[,1],draws[,2],xlab=expression(mu),ylab=expression(beta))
points(deltahat[1],deltahat[2],pch=16,col=2,cex=2)
plot(draws[,1],draws[,3],xlab=expression(mu),ylab=expression(gamma))
points(deltahat[1],deltahat[3],pch=16,col=2,cex=2)
plot(draws[,2],draws[,3],xlab=expression(beta),ylab=expression(gamma))
points(deltahat[2],deltahat[3],pch=16,col=2,cex=2)
plot(draws[,1],draws[,4],xlab=expression(mu),ylab=expression(sigma^2))
points(deltahat[1],sigma2hat,pch=16,col=2,cex=2)
plot(draws[,2],draws[,4],xlab=expression(beta),ylab=expression(sigma^2))
points(deltahat[2],sigma2hat,pch=16,col=2,cex=2)
plot(draws[,3],draws[,4],xlab=expression(gamma),ylab=expression(sigma^2))
points(deltahat[3],sigma2hat,pch=16,col=2,cex=2)
# mu
tab = round(rbind(summary(draws[,1]),summary(draws1[,1])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## MCMC1 -0.431 0.192 0.396 0.371 0.540 1.092
## MCMC2 -0.608 0.223 0.396 0.394 0.564 1.643
# beta
tab = round(rbind(summary(draws[,2]),summary(draws1[,2])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## MCMC1 -0.307 0.278 0.428 0.443 0.608 1.229
## MCMC2 -0.692 0.261 0.424 0.421 0.578 1.461
# gamma
tab = round(rbind(summary(draws[,3]),summary(draws1[,3])),3)
rownames(tab) = c("MCMC1","MCMC2")
tab
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## MCMC1 -2.077 -1.056 -0.789 -0.743 -0.40 0.811
## MCMC2 -2.658 -1.104 -0.791 -0.790 -0.48 1.126
# sigma
tab = round(rbind(summary(sqrt(draws[,4])),summary(sqrt(draws1[,4]))),3)
rownames(tab) = c("MCMC1","MCMC2")
tab
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## MCMC1 0.409 0.577 0.623 0.630 0.676 1.121
## MCMC2 0.430 0.577 0.625 0.631 0.675 1.006