Housing data for 506 census tracts of Boston from the 1970 census. The dataframe BostonHousing contains the original data by Harrison and Rubinfeld (1979), the dataframe BostonHousing2 the corrected version with additional spatial information (see references below). The original data are 506 observations on 14 variables, medv being the target variable.
The \(y\) variable is medv
the median value of owner-occupied homes in USD 1000’s, while the \(x\) variables are
rm
: average number of rooms per dwellinglstat
: percentage of lower status of the populationcrim
: per capita crime rate by townage
: proportion of owner-occupied units built prior to 1940tax
: full-value property-tax rate per USD 10,000ptratio
: pupil-teacher ratio by townWe will fit the standard Gaussian linear regression model, where \[ y_i|x_i,\beta,\sigma^2 \sim N(x_i'\beta,\sigma^2), \quad =1,\ldots,n, \] or, in matrix notation \[ y = X \beta + \varepsilon \qquad \varepsilon \sim N(0,\sigma^2 I_n). \]
#install.packages("mlbench")
library("mlbench")
data(BostonHousing2)
attach(BostonHousing2)
y = medv
X = cbind(1,rm,lstat,crim,age,tax,ptratio)
p = ncol(X)
n = length(y)
The likelihood function is \[
L(\beta,\sigma^2|y,X) =
(2\pi\sigma^2)^{-n/2}\exp\left\{-0.5(y-X\beta)'(y-X\beta)/\sigma^2\right\},
\] and the maximum likelihood estimates \({\hat \beta}_{MLE}\) and \({\hat \sigma}_{MLE}\) are obtained as \[
({\hat \beta}_{MLE},{\hat \sigma}_{MLE}) = \mbox{arg}\max_{\beta,\sigma} L(\beta,\sigma^2|y,X),
\] and are equal to \[
{\hat \beta}_{MLE} = (X'X)^{-1}X'y \ \ \mbox{and} \ \
{\hat \sigma}^2_{MLE} = (y-X{\hat \beta})'(y-X{\hat \beta})/n,
\] where \(p\) is the number of columns in \(X\). These derivations are already available in the R package lm
beta.mle = solve(t(X)%*%X)%*%t(X)%*%y
y.hat = X%*%beta.mle
e.hat = y-y.hat
sig.mle = sqrt(mean(e.hat^2))
table = round(rbind(cbind(beta.mle,lm(y~X-1)$coef),
c(sig.mle,(n-p)/n*summary(lm(y~X-1))$sigma)),4)
colnames(table) = c("Our function","lm function")
rownames(table)[1] = "constant"
rownames(table)[p+1] = "sigma"
table
## Our function lm function
## constant 17.2841 17.2841
## rm 4.4755 4.4755
## lstat -0.5764 -0.5764
## crim -0.0578 -0.0578
## age 0.0207 0.0207
## tax -0.0017 -0.0017
## ptratio -0.8724 -0.8724
## sigma 5.1659 5.1300
summary(lm(y~X-1))
##
## Call:
## lm(formula = y ~ X - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.3896 -3.2349 -0.7797 1.8441 29.3208
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## X 17.284103 3.999730 4.321 1.87e-05 ***
## Xrm 4.475544 0.436292 10.258 < 2e-16 ***
## Xlstat -0.576396 0.054278 -10.619 < 2e-16 ***
## Xcrim -0.057824 0.033905 -1.705 0.0887 .
## Xage 0.020686 0.010905 1.897 0.0584 .
## Xtax -0.001699 0.001983 -0.857 0.3920
## Xptratio -0.872446 0.124820 -6.990 8.87e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.202 on 499 degrees of freedom
## Multiple R-squared: 0.9549, Adjusted R-squared: 0.9543
## F-statistic: 1510 on 7 and 499 DF, p-value: < 2.2e-16
We need to specify the prior for \((\beta,\sigma^2)\): \[ \beta \sim N(b,V) \ \ \mbox{and} \ \ \sigma^2 \sim IG(c,d). \] The full posterior distribution \[ p(\beta,\sigma^2|y,X) \propto L(\beta,\sigma^2|y,X)p(\beta)p(\sigma^2), \] is intractable in closed form. Below we devise a simple Gibbs sampler for approximate Bayesian inference. The full conditional distributions are \[ (\beta|\sigma^2,X,y) \sim N(b_1,V_1), \] where \(V_1^{-1}=V^{-1}+X'X/\sigma^2\) and \(V_1^{-1}b_1=V^{-1}b+X'y/\sigma^2\) are both functios of \(\sigma^2\), and \[ (\sigma^2|\beta,X,y) \sim IG(c_1,d_1), \] where \(c_1=c+n/2\) and \(d_1=d+(y-X\beta)'(y-X\beta)/2\), which is a function of \(\beta\).
# Prior knowledge
b = rep(0,p)
V = diag(100,p)
c = 5/2
d = 5/2
# Gibbs sampler
sigma2 = 1
M0 = 1000
M = 1000
thin = 1
niter = M0+M*thin
betas = matrix(0,niter,p)
sigs = rep(0,niter)
for (iter in 1:niter){
V1 = solve(t(X)%*%X/sigma2+solve(V))
b1 = V1%*%(t(X)%*%y/sigma2+solve(V)%*%b)
beta = b1 + t(chol(V1))%*%rnorm(p)
c1 = c+n/2
d1 = d + sum((y-X%*%beta)^2)/2
sigma2 = 1/rgamma(1,c1,d1)
betas[iter,] = beta
sigs[iter] = sqrt(sigma2)
}
ind = seq(M0+1,niter,by=thin)
betas = betas[ind,]
sigs = sigs[ind]
mean.beta = apply(betas,2,mean)
table = round(cbind(beta.mle,mean.beta),4)
colnames(table) = c("ML","Bayes")
rownames(table)[1] = "Constant"
table
## ML Bayes
## Constant 17.2841 14.8783
## rm 4.4755 4.7000
## lstat -0.5764 -0.5644
## crim -0.0578 -0.0596
## age 0.0207 0.0202
## tax -0.0017 -0.0018
## ptratio -0.8724 -0.8238
The trace plots are commonly used to informaly check the mixing of the draws in an MCMC scheme.
par(mfrow=c(2,4))
for (i in 1:p)
ts.plot(betas[,i],ylab="",main=paste("beta(",i-1,")",sep=""))
ts.plot(sigs,ylab="",main="sigma")
The autocorrelation functions are used to assess how correlated the chains are.
par(mfrow=c(2,4))
for (i in 1:p)
acf(betas[,i],main=paste("beta(",i-1,")",sep=""))
acf(sigs,ylab="",main="sigma")
Below are Monte Carlo approximations to the marginal posterior distributions.
par(mfrow=c(2,4))
for (i in 1:p){
hist(betas[,i],xlab="",prob=TRUE,main=paste("beta(",i-1,")",sep=""))
abline(v=0,col=2,lwd=2)
}
hist(sigs,xlab="",prob=TRUE,main="sigma")
Our package implements an efficient sampler for Gaussian Bayesian linear regression. The package uses elliptical slice sampler instead of regular Gibbs sampler. The function has several built-in priors and user can also provide their own prior function (written as a R function). It is based on our paper entitled Efficient sampling for Gaussian linear regression with arbitrary priors, Journal of Computational and Graphical Statistics, 2019, 28, 142-154, written with Richard Hahn and Jingyu He.You can download the paper here.
#install.packages("bayeslm")
library("bayeslm")
X = cbind(rm,lstat,crim,age,tax,ptratio)
bayesfit = bayeslm(y,X,icept=TRUE,prior="ridge")
## ridge prior
## fixed running time 0.00212838
## sampling time 0.476943
summary(bayesfit)
## Average number of rejections before one acceptance :
## 0.75195
## Summary of beta draws
## based on 16000 valid draws (number of burn in is 4000)
## Summary of Posterior draws
## Moments
## mean std dev eff sample size
## intercept 17.2779 4.032 4938 ***
## rm 4.4592 0.442 3825 ***
## lstat -0.5610 0.057 1696 ***
## crim -0.0590 0.034 6832 *
## age 0.0183 0.011 2814 .
## tax -0.0018 0.002 5604
## ptratio -0.8647 0.127 7787 ***
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Summary of sigma draws
## Mean of standard deviation is 5.325903
## S.d. of standard deviation samples is 0.1680139
## Effective sample size of s.d. is 14283.68
mean.beta1 = apply(bayesfit$beta,2,mean)
table = round(cbind(mean.beta,mean.beta1),4)
colnames(table) = c("Bayes (ours)","bayesml")
rownames(table)[1] = "Constant"
table
## Bayes (ours) bayesml
## Constant 14.8783 17.1918
## rm 4.7000 4.4687
## lstat -0.5644 -0.5609
## crim -0.0596 -0.0589
## age 0.0202 0.0184
## tax -0.0018 -0.0019
## ptratio -0.8238 -0.8635