Introduction to the skew-normal distribution

The goal in this exercise is to illustrate Bayesian model estimation, model selection and model averaging. We will use the skew-normal distribution for the collected data, following Azzalini’s (1985) paper. Additional references are Dalla Valle (2004) Bayes and Branco (2007).

A continuous random variable \(x\) is skew-normal with location parameter \(\theta\), scale parameter \(\sigma^2\) and skewness parameter \(\alpha\), denoted by \(X \sim SN(\theta,\sigma^2,\alpha)\), when its probability density function is \[ p(x|\theta,\sigma^2,\alpha) = 2 \frac{1}{\sigma} \phi\left(\frac{x-\theta}{\sigma}\right)\Phi\left(\alpha\frac{x-\theta}{\sigma}\right), \] where \(-\infty < x,\theta,\alpha < \infty\) and \(\sigma>0\). Here \(\phi\) and \(\Phi\) are the probability density function and the cumulative distribution function of the standard normal distribution. In R, these quantities would be pnorm((x-theta)/sigma) and pnorm((x-theta)/sigma). In addition, \[\begin{eqnarray*} E(x|\theta,\sigma^2,\alpha) &=& \theta + \sigma \delta \sqrt{\frac{2}{\pi}}\\ V(x|\theta,\sigma^2,\alpha) &=& \sigma^2 \left(1-\frac{2}{\pi} \delta^2 \right)\\ \delta &=& \frac{\alpha}{\sqrt{1+\alpha^2}} \qquad -1 < \delta < 1. \end{eqnarray*}\] The skewness index is given by \[ \gamma = \sqrt{\frac{2}{\pi}} \left(\frac{4}{\pi}-1\right)\left(\frac{\alpha}{\sqrt{1+\alpha^2}}\right)^3\left(1-\frac{2}{\pi}\frac{\alpha^2}{1+\alpha^2}\right)^{-3/2}, \] which lies in \((-0.9953,0.9953)\), so the skew-normal only deals with small and moderate skewness.

R functions I’ve created to evaluate and draw from the skew-normal distribution.

f0 = function(x,theta){dnorm(x-theta)}

f1 = function(x,theta,alpha){2*f0(x,theta)*pnorm(alpha*(x-theta))}

sim.skewnormal = function(n,alpha,theta){
  M      = 3*n
  draw   = rnorm(M,theta,1)
  accept = f1(draw,theta,alpha)/(dnorm(draw,theta,1)*2)
  xobs   = draw[runif(M)<accept][1:n]
  return(xobs)
}

Let us now compare the normal and the skew-normal distributions

theta  = 0
alphas = seq(-3.5,4,by=0.5)
deltas = alphas/sqrt(1+alphas^2)

Expectation = theta +  deltas*sqrt(2/pi)
Variance    = 1-2*deltas^2/pi
Skewness    = (4-pi)/2*(deltas*sqrt(2/pi))^3/(1-2*deltas^2/pi)^(3/2)
Exkurtosis  = 2*(pi-3)*(deltas*sqrt(2/pi))^4/(1-2*deltas^2/pi)^2

tab = round(cbind(Expectation,Variance,Skewness,Exkurtosis),2)
tab = cbind(theta,alphas,tab)
tab
##       theta alphas Expectation Variance Skewness Exkurtosis
##  [1,]     0   -3.5       -0.77     0.41    -0.73       0.58
##  [2,]     0   -3.0       -0.76     0.43    -0.67       0.51
##  [3,]     0   -2.5       -0.74     0.45    -0.58       0.42
##  [4,]     0   -2.0       -0.71     0.49    -0.45       0.31
##  [5,]     0   -1.5       -0.66     0.56    -0.30       0.18
##  [6,]     0   -1.0       -0.56     0.68    -0.14       0.06
##  [7,]     0   -0.5       -0.36     0.87    -0.02       0.01
##  [8,]     0    0.0        0.00     1.00     0.00       0.00
##  [9,]     0    0.5        0.36     0.87     0.02       0.01
## [10,]     0    1.0        0.56     0.68     0.14       0.06
## [11,]     0    1.5        0.66     0.56     0.30       0.18
## [12,]     0    2.0        0.71     0.49     0.45       0.31
## [13,]     0    2.5        0.74     0.45     0.58       0.42
## [14,]     0    3.0        0.76     0.43     0.67       0.51
## [15,]     0    3.5        0.77     0.41     0.73       0.58
## [16,]     0    4.0        0.77     0.40     0.78       0.63

Now graphically,

nalpha = length(alphas)
xxx    = seq(-4,4,length=1000)
par(mfrow=c(4,4),mai=c(0.45,0.45,0.2,0.1))
for (i in 1:nalpha){
  plot(xxx,f0(xxx,theta),type="l",ylim=c(0,0.7),xlab="",ylab="",lwd=2)
  lines(xxx,f1(xxx,theta,alphas[i]),col=2,lwd=2)
  legend("topleft",legend=paste("alpha=",alphas[i],sep=""),bty="n")
  abline(v=Expectation[i],col=4,lwd=2,lty=2)
  legend("topright",legend=c(
  paste("Var=",tab[i,4],sep=""),
  paste("Skew=",tab[i,5],sep=""),
  paste("Exkurt=",tab[i,6],sep="")),bty="n")
  mtext(side=1,line=2,"x")
  mtext(side=2,line=2,"Density",cex=1)
}

Simulating skew-normal data

Let us simulated some skew-normal data, \(x_{1,obs},\ldots,x_{n,obs}\). I wrote the R function sim.skewnormal to simulate \(n\) observations from the \(N(\theta,1,\alpha)\), i.e. when \(\sigma=1\).

set.seed(2335)
#alpha = -2
#theta = 0
#n     = 20
#xobs  = sim.skewnormal(n,alpha,theta)

# Azzalini's data - SN(0,1,5)
# install.packages("sn")
data(frontier, package="sn")
xobs = frontier
n    = length(xobs)

Posterior inference via sampling importance resampling

In what follows, we will assume that \(\alpha\) is fixed and will entertain posterior inference based on various values of this skewness quantity. Also, we assume a fairly vague prior for \(\theta\), i.e. \(\theta \sim N(0,1)\).

The columns of the matrix theta.post (below) contains draws from \(p(\theta|x_{obs},\alpha)\) for values of \(\alpha\) in \(\{0,1,\ldots,20\}\). More explicitly, we are simultaneously running 21 competing models indexed by \(\alpha\). We use the sampling importance resampling (SIR) algorithm to obtain \(M\) draws from the posterior based on \(M\) draws from the prior of \(\theta\), \(N(0,1)\).

set.seed(12345)
alphas     = seq(0,19,by=1)
nalpha     = length(alphas)
deltas     = alphas/sqrt(1+alphas^2)
M          = 100000
theta.draw = rnorm(M,0,1)
w          = matrix(0,nalpha,M)
for (j in 1:nalpha)
  for (i in 1:M)
    w[j,i] = prod(f1(xobs,theta.draw[i],alphas[j]))
logpred = log(apply(w,1,sum))
theta.post = matrix(0,nalpha,M)
for (j in 1:nalpha)
  theta.post[j,] = theta.draw[sample(1:M,size=M,replace=TRUE,prob=w[j,])]

Posterior of \(\theta\)

par(mfrow=c(5,4),mai=c(0.45,0.45,0.2,0.1))
for (j in 1:nalpha){
  hist(theta.post[j,],prob=TRUE,xlab="",main="",ylab="",xlim=range(theta.post))
  box()
  title(paste("Log predictive = ",round(logpred[j],2),sep=""))
  if (alphas[j]<=0){
    legend("topleft",legend=paste("alpha=",alphas[j],sep=""),bty="n")
  }else{
    legend("topright",legend=paste("alpha=",alphas[j],sep=""),bty="n")
  }
  mtext(side=1,line=2,expression(theta))
  mtext(side=2,line=2,"Density",cex=1)
}

Other posterior quantities

We compute the posterior distribution of \(E(x|\theta,\alpha) \equiv g(\theta,\alpha)\), the posterior model probabilities \(Pr(\alpha|x_{obs})\) and the posterior distribution of \(E(x)\), averaged over \(\alpha\) (Bayesian model averaging). We use the prior draws of \(\theta\), \(\{\theta^{(1)},\ldots,\theta^{(M)}\}\), to approximate the predictive \[ p(x_{obs}|\alpha) = \int_{-\infty}^{\infty} p(x_{obs}|\alpha,\theta)p(\theta)d\theta \] by \[ {\hat p}(x_{obs}|\alpha) = \frac{1}{M} \sum_{i=1}^M p(x_{obs}|\alpha,\theta^{(i)}). \]

postmean  = apply(theta.post,1,mean)+deltas*sqrt(2/pi)
ProbM     = exp(logpred)
ProbM     = ProbM/sum(ProbM)
upostmean = sum(postmean*ProbM)

L = min(theta.post)
U = max(theta.post)
par(mfrow=c(1,1),mai=rep(0.8,4))
boxplot(t(theta.post),outline=FALSE,ylim=c(L,U),names=alphas,xlab="alpha",
        ylab="Posterior of theta")
legend("right",legend=paste("Unconditional posterior mean = ",round(upostmean,4),sep=""),bty="n")
text(nalpha/2,U,expression(paste("Posterior mean of E(X|theta) = ",theta+(alpha/sqrt(1+alpha^2))*sqrt(2/pi),sep="")),col=2)
text(1:nalpha,U-0.15,round(postmean,2))
text(nalpha/2,L+0.15,"Posterior model probability",col=2)
text(1:nalpha,L,round(ProbM,3))

\(Pr(\alpha|x_{obs})\)

par(mfrow=c(1,1),mai=rep(0.8,4))
plot(alphas,ProbM,lwd=3,type="h",xlab=expression(alpha),ylab="Posterior model probability")

Joint (direct) posterior inference for \((\theta,\alpha)\)

We now consider \(\alpha\) a real number and incorporate both \(\theta\) and \(\alpha\) into our posterior analysis. We maintain the prior of \(\theta\), \(N(0,1)\), and add the prior for \(\alpha \sim t_2(0,1/2)\), a Student’s \(t\) distribution with location at zero, scale at \(1/2\) and 2 degrees of freedom. This prior is induced by the the prior on \(\delta \sim U(-1,1)\).

set.seed(12345)
M          = 1000000
theta.draw = rnorm(M,0,1)
alpha.draw = (1/2)*rt(M,df=2)    # From delta ~ Unif(-1,1)
w          = rep(0,M)
for (i in 1:M)
    w[i] = prod(f1(xobs,theta.draw[i],alpha.draw[i]))
ind = sample(1:M,size=M,replace=TRUE,prob=w)

theta.post = theta.draw[ind]
alpha.post = alpha.draw[ind]
postmean = theta.post+alpha.post/sqrt(1+alpha.post^2)*sqrt(2/pi)

par(mfrow=c(2,2))
plot(theta.post[alpha.post<40],alpha.post[alpha.post<40],xlab=expression(theta),ylab=expression(alpha),
main="Posterior draws")
hist(alpha.post,xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(alpha.post[alpha.post<20],xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(postmean,main=expression(theta+(alpha/sqrt(1+alpha^2))*sqrt(2/pi)),xlab="",ylab="Posterior density",prob=TRUE);box()

xbar = mean(xobs)
se   = sqrt(var(xobs)/length(xobs))
abline(v=xbar,col=4,lwd=2)
abline(v=xbar-2*se,col=4,lty=2,lwd=2)
abline(v=xbar+2*se,col=4,lty=2,lwd=2)

References