Introduction to the skew-normal distribution
- R functions I’ve created to evaluate and draw from the skew-normal distribution.
- Let us now compare the normal and the skew-normal distributions
Simulating skew-normal data
Posterior inference via sampling importance resampling
Joint (direct) posterior inference for \((\theta,\alpha)\)
References

Introduction to the skew-normal distribution

The goal in this exercise is to illustrate Bayesian model estimation, model selection and model averaging. We will use the skew-normal distribution for the collected data, following Azzalini’s (1985) paper. Additional references are Dalla Valle (2004) Bayes and Branco (2007).

A continuous random variable \(x\) is skew-normal with location parameter \(\theta\), scale parameter \(\sigma^2\) and skewness parameter \(\alpha\), denoted by \(X \sim SN(\theta,\sigma^2,\alpha)\), when its probability density function is \[ p(x|\theta,\sigma^2,\alpha) = 2 \frac{1}{\sigma} \phi\left(\frac{x-\theta}{\sigma}\right)\Phi\left(\alpha\frac{x-\theta}{\sigma}\right), \] where \(-\infty < x,\theta,\alpha < \infty\) and \(\sigma>0\). Here \(\phi\) and \(\Phi\) are the probability density function and the cumulative distribution function of the standard normal distribution. In R, these quantities would be pnorm((x-theta)/sigma) and pnorm((x-theta)/sigma). In addition, \[\begin{eqnarray*} E(x|\theta,\sigma^2,\alpha) &=& \theta + \sigma \delta \sqrt{\frac{2}{\pi}}\\ V(x|\theta,\sigma^2,\alpha) &=& \sigma^2 \left(1-\frac{2}{\pi} \delta^2 \right)\\ \delta &=& \frac{\alpha}{\sqrt{1+\alpha^2}} \qquad -1 < \delta < 1. \end{eqnarray*}\] The skewness index is given by \[ \gamma = \sqrt{\frac{2}{\pi}} \left(\frac{4}{\pi}-1\right)\left(\frac{\alpha}{\sqrt{1+\alpha^2}}\right)^3\left(1-\frac{2}{\pi}\frac{\alpha^2}{1+\alpha^2}\right)^{-3/2}, \] which lies in \((-0.9953,0.9953)\), so the skew-normal only deals with small and moderate skewness.

R functions I’ve created to evaluate and draw from the skew-normal distribution.

f0 = function(x,theta){dnorm(x-theta)}

f1 = function(x,theta,alpha){2*f0(x,theta)*pnorm(alpha*(x-theta))}

sim.skewnormal = function(n,alpha,theta){
  M      = 3*n
  draw   = rnorm(M,theta,1)
  accept = f1(draw,theta,alpha)/(dnorm(draw,theta,1)*2)
  xobs   = draw[runif(M)<accept][1:n]
  return(xobs)
}

Let us now compare the normal and the skew-normal distributions

theta  = 0
alphas = seq(-3.5,4,by=0.5)
deltas = alphas/sqrt(1+alphas^2)

Expectation = theta +  deltas*sqrt(2/pi)
Variance    = 1-2*deltas^2/pi
Skewness    = (4-pi)/2*(deltas*sqrt(2/pi))^3/(1-2*deltas^2/pi)^(3/2)
Exkurtosis  = 2*(pi-3)*(deltas*sqrt(2/pi))^4/(1-2*deltas^2/pi)^2

tab = round(cbind(Expectation,Variance,Skewness,Exkurtosis),2)
tab = cbind(theta,alphas,tab)
tab

##       theta alphas Expectation Variance Skewness Exkurtosis
##  [1,]     0   -3.5       -0.77     0.41    -0.73       0.58
##  [2,]     0   -3.0       -0.76     0.43    -0.67       0.51
##  [3,]     0   -2.5       -0.74     0.45    -0.58       0.42
##  [4,]     0   -2.0       -0.71     0.49    -0.45       0.31
##  [5,]     0   -1.5       -0.66     0.56    -0.30       0.18
##  [6,]     0   -1.0       -0.56     0.68    -0.14       0.06
##  [7,]     0   -0.5       -0.36     0.87    -0.02       0.01
##  [8,]     0    0.0        0.00     1.00     0.00       0.00
##  [9,]     0    0.5        0.36     0.87     0.02       0.01
## [10,]     0    1.0        0.56     0.68     0.14       0.06
## [11,]     0    1.5        0.66     0.56     0.30       0.18
## [12,]     0    2.0        0.71     0.49     0.45       0.31
## [13,]     0    2.5        0.74     0.45     0.58       0.42
## [14,]     0    3.0        0.76     0.43     0.67       0.51
## [15,]     0    3.5        0.77     0.41     0.73       0.58
## [16,]     0    4.0        0.77     0.40     0.78       0.63

Now graphically,

nalpha = length(alphas)
xxx    = seq(-4,4,length=1000)
par(mfrow=c(4,4),mai=c(0.45,0.45,0.2,0.1))
for (i in 1:nalpha){
  plot(xxx,f0(xxx,theta),type="l",ylim=c(0,0.7),xlab="",ylab="",lwd=2)
  lines(xxx,f1(xxx,theta,alphas[i]),col=2,lwd=2)
  legend("topleft",legend=paste("alpha=",alphas[i],sep=""),bty="n")
  abline(v=Expectation[i],col=4,lwd=2,lty=2)
  legend("topright",legend=c(
  paste("Var=",tab[i,4],sep=""),
  paste("Skew=",tab[i,5],sep=""),
  paste("Exkurt=",tab[i,6],sep="")),bty="n")
  mtext(side=1,line=2,"x")
  mtext(side=2,line=2,"Density",cex=1)
}

Simulating skew-normal data

Let us simulated some skew-normal data, \(x_{1,obs},\ldots,x_{n,obs}\). I wrote the R function sim.skewnormal to simulate \(n\) observations from the \(N(\theta,1,\alpha)\), i.e. when \(\sigma=1\).

set.seed(2335)
#alpha = -2
#theta = 0
#n     = 20
#xobs  = sim.skewnormal(n,alpha,theta)

# Azzalini's data - SN(0,1,5)
# install.packages("sn")
data(frontier, package="sn")
xobs = frontier
n    = length(xobs)

Posterior inference via sampling importance resampling

In what follows, we will assume that \(\alpha\) is fixed and will entertain posterior inference based on various values of this skewness quantity. Also, we assume a fairly vague prior for \(\theta\), i.e. \(\theta \sim N(0,1)\).

The columns of the matrix theta.post (below) contains draws from \(p(\theta|x_{obs},\alpha)\) for values of \(\alpha\) in \(\{0,1,\ldots,20\}\). More explicitly, we are simultaneously running 21 competing models indexed by \(\alpha\). We use the sampling importance resampling (SIR) algorithm to obtain \(M\) draws from the posterior based on \(M\) draws from the prior of \(\theta\), \(N(0,1)\).

set.seed(12345)
alphas     = seq(0,19,by=1)
nalpha     = length(alphas)
deltas     = alphas/sqrt(1+alphas^2)
M          = 100000
theta.draw = rnorm(M,0,1)
w          = matrix(0,nalpha,M)
for (j in 1:nalpha)
  for (i in 1:M)
    w[j,i] = prod(f1(xobs,theta.draw[i],alphas[j]))
logpred = log(apply(w,1,sum))
theta.post = matrix(0,nalpha,M)
for (j in 1:nalpha)
  theta.post[j,] = theta.draw[sample(1:M,size=M,replace=TRUE,prob=w[j,])]

Posterior of \(\theta\)

par(mfrow=c(5,4),mai=c(0.45,0.45,0.2,0.1))
for (j in 1:nalpha){
  hist(theta.post[j,],prob=TRUE,xlab="",main="",ylab="",xlim=range(theta.post))
  box()
  title(paste("Log predictive = ",round(logpred[j],2),sep=""))
  if (alphas[j]<=0){
    legend("topleft",legend=paste("alpha=",alphas[j],sep=""),bty="n")
  }else{
    legend("topright",legend=paste("alpha=",alphas[j],sep=""),bty="n")
  }
  mtext(side=1,line=2,expression(theta))
  mtext(side=2,line=2,"Density",cex=1)
}

\(Pr(\alpha|x_{obs})\)

par(mfrow=c(1,1),mai=rep(0.8,4))
plot(alphas,ProbM,lwd=3,type="h",xlab=expression(alpha),ylab="Posterior model probability")

Joint (direct) posterior inference for \((\theta,\alpha)\)

We now consider \(\alpha\) a real number and incorporate both \(\theta\) and \(\alpha\) into our posterior analysis. We maintain the prior of \(\theta\), \(N(0,1)\), and add the prior for \(\alpha \sim t_2(0,1/2)\), a Student’s \(t\) distribution with location at zero, scale at \(1/2\) and 2 degrees of freedom. This prior is induced by the the prior on \(\delta \sim U(-1,1)\).

set.seed(12345)
M          = 1000000
theta.draw = rnorm(M,0,1)
alpha.draw = (1/2)*rt(M,df=2)    # From delta ~ Unif(-1,1)
w          = rep(0,M)
for (i in 1:M)
    w[i] = prod(f1(xobs,theta.draw[i],alpha.draw[i]))
ind = sample(1:M,size=M,replace=TRUE,prob=w)

theta.post = theta.draw[ind]
alpha.post = alpha.draw[ind]
postmean = theta.post+alpha.post/sqrt(1+alpha.post^2)*sqrt(2/pi)

par(mfrow=c(2,2))
plot(theta.post[alpha.post<40],alpha.post[alpha.post<40],xlab=expression(theta),ylab=expression(alpha),
main="Posterior draws")
hist(alpha.post,xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(alpha.post[alpha.post<20],xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(postmean,main=expression(theta+(alpha/sqrt(1+alpha^2))*sqrt(2/pi)),xlab="",ylab="Posterior density",prob=TRUE);box()

xbar = mean(xobs)
se   = sqrt(var(xobs)/length(xobs))
abline(v=xbar,col=4,lwd=2)
abline(v=xbar-2*se,col=4,lty=2,lwd=2)
abline(v=xbar+2*se,col=4,lty=2,lwd=2)

References

Azzalini (1985) A class of distributions which include the normal ones. Scandinavian Journal of Statistics, 12, 171-178.
Dalla Valle (2004) The skew-normal distribution. In Skew- Elliptical Distributions and Their Applications : A Journey Beyond Normality, editor Mark Genton, Chapman and Hall/CRC, Boca Raton.
Bayes and Branco (2007) Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Brazilian Journal of Probability and Statistics, 21(2), 141-163. http://www.jstor.org/stable/43601095

Azzalini’s skew-normal

Hedibert Freitas Lopes

5/13/2021