The skew-normal model

The goal in this homework is to illustrate Bayesian posterior inference for the parameters of a skew-normal distribution, following Azzalini’s (1985) paper. Additional references are Dalla Valle (2004) Bayes and Branco (2007). A continuous random variable \(x\) is skew-normal with location parameter \(\mu\), scale parameter \(\sigma^2\) and skewness parameter \(\alpha\), denoted by \(X \sim SN(\mu,\sigma^2,\alpha)\), when its probability density function is \[ p(x|\mu,\sigma^2,\alpha) = 2 \frac{1}{\sigma} \phi\left(\frac{x-\mu}{\sigma}\right)\Phi\left(\alpha\frac{x-\mu}{\sigma}\right), \] where \(-\infty < x,\mu,\alpha < \infty\) and \(\sigma>0\). Here \(\phi\) and \(\Phi\) are the probability density function and the cumulative distribution function of the standard normal distribution. In R, these quantities would be dnorm((x-theta)/sigma) and pnorm((x-theta)/sigma), respectively.
In addition, \[\begin{eqnarray*} E(x|\mu,\sigma^2,\alpha) &=& \mu + \sigma \delta \sqrt{\frac{2}{\pi}}\\ V(x|\mu,\sigma^2,\alpha) &=& \sigma^2 \left(1-\frac{2}{\pi} \delta^2 \right)\\ \delta &=& \frac{\alpha}{\sqrt{1+\alpha^2}} \qquad -1 < \delta < 1. \end{eqnarray*}\] The skewness index, denoted here by \(\gamma\), is given by \[ \gamma = \sqrt{\frac{2}{\pi}} \left(\frac{4}{\pi}-1\right)\left(\frac{\alpha}{\sqrt{1+\alpha^2}}\right)^3\left(1-\frac{2}{\pi}\frac{\alpha^2}{1+\alpha^2}\right)^{-3/2}, \] which lies in \((-0.9953,0.9953)\), so the skew-normal only deals with small and moderate skewness.

Question 1)

For \(\mu=0\) and \(\sigma=1\), choose a few values of \(\alpha\) and plot the densities \(p(x|\alpha)\). Also, compute i) \(E(x|\alpha)\), ii) \(V(x|\alpha)\), and iii) the skewness indexes, \(\gamma\), for the chosen values of \(\alpha\). Comment your findings.

Data and \(SN(\mu,1,\alpha)\)

Let us assume that \(\sigma=1\), so the only unknown parameters are \(\alpha\), which controls the level of skewness, and \(\mu\), the parameter that controls the location. Let \(\theta=(\mu,\alpha)\), such that the goal is to perform posterior inference for \(\theta\) given the data below by assuming that \[ x_1,\ldots,x_n|\theta \ \ \mbox{iid} \ \ SN(\mu,1,\alpha), \] for \(n=50\).

x = c(
 0.1169,  1.7311,  0.0144,  0.4881,  2.3877, -0.0853,  0.0522,  0.7226,  
 0.8718,  3.0415,  0.6363,  1.3590,  1.5958,  0.4567,  0.9885,  1.7001,  
 1.0380,  1.0195,  0.3528,  1.4249,  0.3170,  1.3276, -0.1032,  1.1568,  
 0.0699,  1.6802,  0.2470,  0.4147,  1.6882,  0.7256,  1.3568,  1.1091,  
 0.0500,  2.2886,  1.4985,  2.7261,  1.8443, -0.0687,  0.9441,  0.6872,  
 0.5258,  0.4743,  0.4240,  0.7349,  0.4428,  0.1880,  0.4642,  0.2786,
 0.2742,  0.5678)

Prior of \((\mu,\alpha)\)

We assume that \(\mu\) and \(\alpha\) are a priori independent with \(\mu\) is \(U(-1,1)\) and \(\alpha\) is \(N(0,30^2)\), such that \(p(\mu,\alpha) \propto \exp\left\{-0.5\alpha^2/900\right\}\). Below we provide R code for pointwise evaluation of the likelihood function as well as the posterior density (up to a normalizing constant).

likelihood = function(mu,alpha){
  prod(2*dnorm(x-mu)*pnorm(alpha*(x-mu)))
}
posterior = function(mu,alpha){
 likelihood(mu,alpha)*dnorm(alpha,0,30)
}

Posterior contours

Here we plot contours of the posterior distribution \(p(\mu,\alpha|\mbox{data})\), with \(\mu\) and \(\alpha\) on grids of size \(N=200\) in the intervals \((-1,1)\) and \((-5,100)\), respectively. Notice the non-standard relationship between \(\mu\) and \(\alpha\), a posteriori. Using the R functions provided above, we show below contours of the posterior distribution \(p(\mu,\alpha|\mbox{data})\).

More questions

Assume that \(\mbox{data}=\{x_1,\ldots,x_n\}\) is the data above (\(n=50\)).

Question 2) SIR-based posterior inference

Propose a sampling importance resampling (SIR) algorithm to sample from the posterior of \(\theta\), i.e.  \[ p(\mu,\alpha|\mbox{data}), \] and obtain SIR-based approximations to

  1. \(E(\mu|\mbox{data})\) and \(E(\alpha|\mbox{data})\),
  2. Median\((\mu|\mbox{data})\) and Median\((\alpha|\mbox{data})\)
  3. \(V(\mu|\mbox{data})\) and \(V(\alpha|\mbox{data})\),
  4. 5th and 95th percentiles of the marginal posteriors \(p(\mu|\mbox{data})\) and \(p(\alpha|\mbox{data})\), and
  5. The posterior correlation between \(\mu\) and \(\alpha\).

Explain your choice for the proposal distribution, the SIR sample size (\(M\)) and the size of the resample (\(N\)).

Suggestion: The results below are based on SIR sample size of \(M=100000\) and resample size of \(N=20000\).

SIR-based posterior draws and marginal posterior densities

SIR-based posterior summaries

##        Mean Median St.Dev 5th perc. 95th perc.
## mu    -0.07  -0.09   0.07     -0.15       0.06
## alpha 22.71  17.61  16.93      4.23      57.08

Posterior correlation between \(\mu\) and \(\alpha\).

## [1] -0.568

Question 3) RWM-based posterior inference

Propose a random-walk Metropolis (RWM) scheme to sample from the posterior of \(\theta\), i.e.  \[ p(\mu,\alpha|\mbox{data}), \] and obtain RWM-based approximations to

  1. \(E(\mu|\mbox{data})\) and \(E(\alpha|\mbox{data})\),
  2. \(V(\mu|\mbox{data})\) and \(V(\alpha|\mbox{data})\),
  3. 5th and 95th percentiles of the marginal posteriors \(p(\mu|\mbox{data})\) and \(p(\alpha|\mbox{data})\), and
  4. The posterior correlation between \(\mu\) and \(\alpha\).

Explain your choice of initial values and the the tuning parameter of the RWM algorithm. Also, explain how you chose the burn-in size (\(M_0\)), the skip size (\(L\)) and the size of the posterior draws (\(M\)). Compare the results from 1) and 3).

Suggestion: \(M_0=10000\), \(L=100\) and \(N=20000\). For the proposals for \(\mu\) and \(\alpha\), we suggest using \(\mu^*\sim N(\mu^{(j)},0.25^2)\) and \(\alpha^* \sim N(\alpha^{(j)},5^2)\), where \((\mu,\alpha)^{(j)}\) is the current state of the Markov chain. Below are the results of our implementation.

Random-walk Metropolis chains/draws and sample autocorrelation of chains/draws

SIR-based graphical posterior summaries

Marginal posterior summaries

##        Mean Median St.Dev 5th perc. 95th perc.
## mu    -0.07  -0.09   0.07     -0.15       0.07
## alpha 22.81  17.90  17.28      4.20      57.25

Posterior correlation between \(\mu\) and \(\alpha\).

## [1] -0.553

Comparing the results

SIR-based posterior summaries

##        Mean Median St.Dev 5th perc. 95th perc.
## mu    -0.07  -0.09   0.07     -0.15       0.06
## alpha 22.71  17.61  16.93      4.23      57.08

RWM-based posterior summaries

##        Mean Median St.Dev 5th perc. 95th perc.
## mu    -0.07  -0.09   0.07     -0.15       0.07
## alpha 22.81  17.90  17.28      4.20      57.25