The goal in this homework is to illustrate Bayesian posterior
inference for the parameters of a skew-normal distribution, following
Azzalini’s (1985) paper. Additional references are Dalla Valle (2004)
Bayes and Branco (2007). A continuous random variable \(x\) is skew-normal with location parameter
\(\mu\), scale parameter \(\sigma^2\) and skewness parameter \(\alpha\), denoted by \(X \sim SN(\mu,\sigma^2,\alpha)\), when its
probability density function is \[
p(x|\mu,\sigma^2,\alpha) = 2 \frac{1}{\sigma}
\phi\left(\frac{x-\mu}{\sigma}\right)\Phi\left(\alpha\frac{x-\mu}{\sigma}\right),
\] where \(-\infty < x,\mu,\alpha
< \infty\) and \(\sigma>0\). Here \(\phi\) and \(\Phi\) are the probability density function
and the cumulative distribution function of the standard normal
distribution. In R, these quantities would be
dnorm((x-theta)/sigma) and
pnorm((x-theta)/sigma), respectively.
In addition, \[\begin{eqnarray*}
E(x|\mu,\sigma^2,\alpha) &=& \mu + \sigma \delta
\sqrt{\frac{2}{\pi}}\\
V(x|\mu,\sigma^2,\alpha) &=& \sigma^2 \left(1-\frac{2}{\pi}
\delta^2 \right)\\
\delta &=& \frac{\alpha}{\sqrt{1+\alpha^2}} \qquad -1 <
\delta < 1.
\end{eqnarray*}\] The skewness index, denoted here by \(\gamma\), is given by \[
\gamma = \sqrt{\frac{2}{\pi}}
\left(\frac{4}{\pi}-1\right)\left(\frac{\alpha}{\sqrt{1+\alpha^2}}\right)^3\left(1-\frac{2}{\pi}\frac{\alpha^2}{1+\alpha^2}\right)^{-3/2},
\] which lies in \((-0.9953,0.9953)\), so the skew-normal only
deals with small and moderate skewness.
For \(\mu=0\) and \(\sigma=1\), choose a few values of \(\alpha\) and plot the densities \(p(x|\alpha)\). Also, compute i) \(E(x|\alpha)\), ii) \(V(x|\alpha)\), and iii) the skewness indexes, \(\gamma\), for the chosen values of \(\alpha\). Comment your findings.
Let us assume that \(\sigma=1\), so the only unknown parameters are \(\alpha\), which controls the level of skewness, and \(\mu\), the parameter that controls the location. Let \(\theta=(\mu,\alpha)\), such that the goal is to perform posterior inference for \(\theta\) given the data below by assuming that \[ x_1,\ldots,x_n|\theta \ \ \mbox{iid} \ \ SN(\mu,1,\alpha), \] for \(n=50\).
x = c(
0.1169, 1.7311, 0.0144, 0.4881, 2.3877, -0.0853, 0.0522, 0.7226,
0.8718, 3.0415, 0.6363, 1.3590, 1.5958, 0.4567, 0.9885, 1.7001,
1.0380, 1.0195, 0.3528, 1.4249, 0.3170, 1.3276, -0.1032, 1.1568,
0.0699, 1.6802, 0.2470, 0.4147, 1.6882, 0.7256, 1.3568, 1.1091,
0.0500, 2.2886, 1.4985, 2.7261, 1.8443, -0.0687, 0.9441, 0.6872,
0.5258, 0.4743, 0.4240, 0.7349, 0.4428, 0.1880, 0.4642, 0.2786,
0.2742, 0.5678)
We assume that \(\mu\) and \(\alpha\) are a priori independent with \(\mu\) is \(U(-1,1)\) and \(\alpha\) is \(N(0,30^2)\), such that \(p(\mu,\alpha) \propto \exp\left\{-0.5\alpha^2/900\right\}\). Below we provide R code for pointwise evaluation of the likelihood function as well as the posterior density (up to a normalizing constant).
likelihood = function(mu,alpha){
prod(2*dnorm(x-mu)*pnorm(alpha*(x-mu)))
}
posterior = function(mu,alpha){
likelihood(mu,alpha)*dnorm(alpha,0,30)
}
Here we plot contours of the posterior distribution \(p(\mu,\alpha|\mbox{data})\), with \(\mu\) and \(\alpha\) on grids of size \(N=200\) in the intervals \((-1,1)\) and \((-5,100)\), respectively. Notice the non-standard relationship between \(\mu\) and \(\alpha\), a posteriori. Using the R functions provided above, we show below contours of the posterior distribution \(p(\mu,\alpha|\mbox{data})\).
Assume that \(\mbox{data}=\{x_1,\ldots,x_n\}\) is the data above (\(n=50\)).
Propose a sampling importance resampling (SIR) algorithm to sample from the posterior of \(\theta\), i.e. \[ p(\mu,\alpha|\mbox{data}), \] and obtain SIR-based approximations to
Explain your choice for the proposal distribution, the SIR sample size (\(M\)) and the size of the resample (\(N\)).
Suggestion: The results below are based on SIR sample size of \(M=100000\) and resample size of \(N=20000\).
## Mean Median St.Dev 5th perc. 95th perc.
## mu -0.07 -0.09 0.07 -0.15 0.06
## alpha 22.71 17.61 16.93 4.23 57.08
## [1] -0.568
Propose a random-walk Metropolis (RWM) scheme to sample from the posterior of \(\theta\), i.e. \[ p(\mu,\alpha|\mbox{data}), \] and obtain RWM-based approximations to
Explain your choice of initial values and the the tuning parameter of the RWM algorithm. Also, explain how you chose the burn-in size (\(M_0\)), the skip size (\(L\)) and the size of the posterior draws (\(M\)). Compare the results from 1) and 3).
Suggestion: \(M_0=10000\), \(L=100\) and \(N=20000\). For the proposals for \(\mu\) and \(\alpha\), we suggest using \(\mu^*\sim N(\mu^{(j)},0.25^2)\) and \(\alpha^* \sim N(\alpha^{(j)},5^2)\), where \((\mu,\alpha)^{(j)}\) is the current state of the Markov chain. Below are the results of our implementation.
## Mean Median St.Dev 5th perc. 95th perc.
## mu -0.07 -0.09 0.07 -0.15 0.07
## alpha 22.81 17.90 17.28 4.20 57.25
## [1] -0.553
SIR-based posterior summaries
## Mean Median St.Dev 5th perc. 95th perc.
## mu -0.07 -0.09 0.07 -0.15 0.06
## alpha 22.71 17.61 16.93 4.23 57.08
RWM-based posterior summaries
## Mean Median St.Dev 5th perc. 95th perc.
## mu -0.07 -0.09 0.07 -0.15 0.07
## alpha 22.81 17.90 17.28 4.20 57.25