This homework assignment must be completed individually. To ensure a comprehensive understanding of the material, students will be selected at random for a brief oral examination to discuss their solutions and the underlying methodology.
For \(X\) and \(Y\) independent random variables with \(X \sim Gamma(\alpha,1)\) and \(Y \sim Gamma(\beta,1)\), it can be shown that \[ B_g = \frac{X}{X+Y} \sim Beta(\alpha,\beta). \]
For very small \(\alpha,\beta\) (e.g., \(\alpha, \beta < 1\)), the scheme is elegantly simple and avoids complex functions:
Step 1: Generate \(U_1, U_2 \sim \mbox{Uniform}(0, 1)\);
Step 2: Set \(X = U_1^{1/\alpha}\) and \(Y = U_2^{1/\beta}\);
Step 3: If \(X + Y \leq 1\), then \[ B_j = \frac{X}{X + Y} \sim Beta(\alpha,\beta). \] Otherwise, repeat.
Jöhnk, M. D. (1964). Erzeugung von Beta-verteilten und Gamma-verteilten Zufallszahlen. Metrika, 8(1), 5–15, which is in German! The most comprehensive English resource for this and nearly all other classical sampling algorithms is Luc Devroye’s definitive text, Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag. His book book provides the full mathematical proof and a performance analysis of Jöhnk’s method in Chapter 9. Another reference, which is cited as a standard reference for the implementation of the algorithm is Kennedy, W. J., & Gentle, J. E. (1980). Statistical Computing. Marcel Dekker.
Let us call the first sampling scheme rbetag and the second rbetaj.
Compare draws from these two samplers with draws from the R function rbeta, for the case where \(B \sim Beta(0.7,0.7)\). You can, for instance, compare the histograms of the draws to the exact beta density (dbeta in R).
Obtain the Monte Carlo approximation of \(Pr(B<0.2)=0.2504119\).
Repeat the sampling \(N=1,000\) times and compare root mean squared errors (RMSEs) and the mean absolute errors (MAEs) for the three sampler.
What happens when \(\alpha=2.5\) and \(\beta=2.5\), where \(Pr(B<0.2)=0.07718863\)?
We collected data from 15 volunteers (my students and myself).
n = 15
data = matrix(c(178,70,175,80,184,82,200,89,183,80,173,65,185,70,162,70,
190,85,174,80,180,87,165,74,176,72,174,63,196,113),n,2,byrow=TRUE)
height = scale(data[,1])
weight = scale(data[,2])
par(mfrow=c(1,1))
plot(height,weight,xlab="Height (standardized)",ylab="Weight (standardized)",pch=16)
points(height[n],weight[n],col=2,pch=16,cex=1.5)
abline(v=0,lty=2)
abline(h=0,lty=2)
Below we use the lm (linear model) function from R to run the OLS estimation for the regression \[ y_i = \beta \ x_i + \varepsilon_i \qquad\qquad \varepsilon_i \ \ \mbox{iid} \ \ N(0,\sigma^2), \] where \(x_i=\mbox{height}_i\) and \(y_i=\mbox{weight}_i\), for \(i=1,\ldots,n\). From the frequentist/Fisherian approach, we have learned that the MLEs of \(\beta\) and \(\sigma^2\) are, respectively, \[ {\widehat \beta} = \frac{\sum_{i=1}^n y_i x_i}{\sum_{i=1}^n x^2_i} \qquad \mbox{and} \qquad {\widehat \sigma}^2 = \frac{\sum_{i=1}^n (y_i-{\widehat y}_i)^2}{n}. \] We also know the sampling distribution of these estimators: \[ {\widehat \beta}|{\widehat \sigma}^2 \sim t_{n-1}\left(\beta,\frac{{\widehat \sigma}^2}{\sum_{i=1}^n x_i^2}\right) \qquad and \qquad {\widehat \sigma}^2|\sigma^2 \sim IG\left(\frac{n-1}{2},\frac{(n-1)\sigma^2}{2}\right), \] from which confidence intervals and hypothesis testing can be derived. An unbiased estimator of \(\sigma^2\) is \(s^2 = n{\widehat \sigma}^2/(n-1)\).
bhat = sum(height*weight)/sum(height^2)
sighat = sqrt(sum((weight-bhat*height)^2)/n)
s = sqrt(n*sighat^2/(n-1))
se = s/sqrt(sum(height^2))
round(c(bhat,se),4)
## [1] 0.6875 0.1941
round(c(sighat,s),4)
## [1] 0.7016 0.7262
# The -1 is to inform the lm function that the regression has no intercept
summary(lm(weight~height-1))
##
## Call:
## lm(formula = weight ~ height - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0537 -0.5485 -0.1120 0.4731 1.7021
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## height 0.6875 0.1941 3.542 0.00325 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7262 on 14 degrees of freedom
## Multiple R-squared: 0.4727, Adjusted R-squared: 0.435
## F-statistic: 12.55 on 1 and 14 DF, p-value: 0.003249
par(mfrow=c(1,1))
plot(height,weight,xlab="Height (standardized)",ylab="Weight (standardized)")
points(height[n],weight[n],col=2,pch=16,cex=2)
abline(lm(weight~height-1),col=2,lwd=2)
abline(lm(weight[1:(n-1)]~height[1:(n-1)]-1),lwd=2)
legend("topleft",legend=c("With atypical","Without atypical "),col=2:1,lwd=2,lty=1,bty="n")
abline(v=0,lty=2)
abline(h=0,lty=2)
Let us assume the following independent prior for \(\beta\) and \(\sigma^2\) \[ p(\beta,\sigma^2) = p(\beta|\sigma^2)p(\sigma^2), \] with \(\beta \sim N(b_0,\sigma^2B_0)\) and \(\sigma^2 \sim IG(c_0,d_0)\), with hyperparameters \(b_0=0\), \(B_0=1\), \(c_0=2\) and \(d0=1\). Notice that this is a conjugate prior for \((\beta,\sigma^2)\), so closed-form posterior inference is readily available.
b0=0
B0=1
c0=2
d0=1
M = 1000
sig2 = 1/rgamma(M,c0,d0)
beta = rnorm(M,b0,sqrt(sig2*B0))
par(mfrow=c(1,1))
boxplot(cbind(beta,sqrt(sig2)),names=c(expression(beta),expression(sigma)),outline=F,xlab="",horizontal=TRUE)
points(bhat,1,col=2,pch=16,cex=2)
points(s,2,col=2,pch=16,cex=2)
title("Prior draws (MLE in red)")
Implement the SIR algorithm to obtain draws from the posterior \(p(\beta,\sigma^2|\mbox{data})\). You can either use the prior for \((\beta,\sigma^2)\) as proposal density or you can use the following proposal: \(q(\beta) \equiv N(0.7,0.25)\) and \(q(\sigma^2) \equiv IG(7,5)\).
Note: These proposal densities are inspired by the sampling distributions of the corresponding estimators. Recall that the proposal distributions can be anything with or without the use of the data since it is not altering the prior in any way.
Implement the Gibbs sampler to obtain draws from the posterior \(p(\beta,\sigma^2|\mbox{data})\). Check the mixing of the Markov chains and the convergence of the draws to the posterior.
Repeat the previous exercise, but replace the Gibbs sampler by a Metropolis algorithm with proposals \(q(\beta) \equiv N(0.7,0.25)\) and \(q(\sigma^2) \equiv IG(7,5)\), which already appear in 1).
If the three algorithms are running properly, the results should be the same up to small Monte Carlo errors.