The goal in this exercise is to illustrate Bayesian model estimation, model selection and model averaging. We will use the skew-normal distribution for the collected data, following Azzalini’s (1985) paper. Additional references are Dalla Valle (2004) Bayes and Branco (2007).
A continuous random variable \(x\) is skew-normal with location parameter \(\theta\), scale parameter \(\sigma^2\) and skewness parameter \(\alpha\), denoted by \(X \sim SN(\theta,\sigma^2,\alpha)\), when its probability density function is \[ p(x|\theta,\sigma^2,\alpha) = 2 \frac{1}{\sigma} \phi\left(\frac{x-\theta}{\sigma}\right)\Phi\left(\alpha\frac{x-\theta}{\sigma}\right), \] where \(-\infty < x,\theta,\alpha < \infty\) and \(\sigma>0\). Here \(\phi\) and \(\Phi\) are the probability density function and the cumulative distribution function of the standard normal distribution. In R, these quantities would be pnorm((x-theta)/sigma) and pnorm((x-theta)/sigma). In addition, \[\begin{eqnarray*} E(x|\theta,\sigma^2,\alpha) &=& \theta + \sigma \delta \sqrt{\frac{2}{\pi}}\\ V(x|\theta,\sigma^2,\alpha) &=& \sigma^2 \left(1-\frac{2}{\pi} \delta^2 \right)\\ \delta &=& \frac{\alpha}{\sqrt{1+\alpha^2}} \qquad -1 < \delta < 1. \end{eqnarray*}\] The skewness index is given by \[ \gamma = \sqrt{\frac{2}{\pi}} \left(\frac{4}{\pi}-1\right)\left(\frac{\alpha}{\sqrt{1+\alpha^2}}\right)^3\left(1-\frac{2}{\pi}\frac{\alpha^2}{1+\alpha^2}\right)^{-3/2}, \] which lies in \((-0.9953,0.9953)\), so the skew-normal only deals with small and moderate skewness.
f0 = function(x,theta){dnorm(x-theta)}
f1 = function(x,theta,alpha){2*f0(x,theta)*pnorm(alpha*(x-theta))}
sim.skewnormal = function(n,alpha,theta){
M = 3*n
draw = rnorm(M,theta,1)
accept = f1(draw,theta,alpha)/(dnorm(draw,theta,1)*2)
xobs = draw[runif(M)<accept][1:n]
return(xobs)
}
theta = 0
alphas = seq(-3.5,4,by=0.5)
deltas = alphas/sqrt(1+alphas^2)
Expectation = theta + deltas*sqrt(2/pi)
Variance = 1-2*deltas^2/pi
Skewness = (4-pi)/2*(deltas*sqrt(2/pi))^3/(1-2*deltas^2/pi)^(3/2)
Exkurtosis = 2*(pi-3)*(deltas*sqrt(2/pi))^4/(1-2*deltas^2/pi)^2
tab = round(cbind(Expectation,Variance,Skewness,Exkurtosis),2)
tab = cbind(theta,alphas,tab)
tab
## theta alphas Expectation Variance Skewness Exkurtosis
## [1,] 0 -3.5 -0.77 0.41 -0.73 0.58
## [2,] 0 -3.0 -0.76 0.43 -0.67 0.51
## [3,] 0 -2.5 -0.74 0.45 -0.58 0.42
## [4,] 0 -2.0 -0.71 0.49 -0.45 0.31
## [5,] 0 -1.5 -0.66 0.56 -0.30 0.18
## [6,] 0 -1.0 -0.56 0.68 -0.14 0.06
## [7,] 0 -0.5 -0.36 0.87 -0.02 0.01
## [8,] 0 0.0 0.00 1.00 0.00 0.00
## [9,] 0 0.5 0.36 0.87 0.02 0.01
## [10,] 0 1.0 0.56 0.68 0.14 0.06
## [11,] 0 1.5 0.66 0.56 0.30 0.18
## [12,] 0 2.0 0.71 0.49 0.45 0.31
## [13,] 0 2.5 0.74 0.45 0.58 0.42
## [14,] 0 3.0 0.76 0.43 0.67 0.51
## [15,] 0 3.5 0.77 0.41 0.73 0.58
## [16,] 0 4.0 0.77 0.40 0.78 0.63
Now graphically,
nalpha = length(alphas)
xxx = seq(-4,4,length=1000)
par(mfrow=c(4,4),mai=c(0.45,0.45,0.2,0.1))
for (i in 1:nalpha){
plot(xxx,f0(xxx,theta),type="l",ylim=c(0,0.7),xlab="",ylab="",lwd=2)
lines(xxx,f1(xxx,theta,alphas[i]),col=2,lwd=2)
legend("topleft",legend=paste("alpha=",alphas[i],sep=""),bty="n")
abline(v=Expectation[i],col=4,lwd=2,lty=2)
legend("topright",legend=c(
paste("Var=",tab[i,4],sep=""),
paste("Skew=",tab[i,5],sep=""),
paste("Exkurt=",tab[i,6],sep="")),bty="n")
mtext(side=1,line=2,"x")
mtext(side=2,line=2,"Density",cex=1)
}
Let us simulated some skew-normal data, \(x_{1,obs},\ldots,x_{n,obs}\). I wrote the R function sim.skewnormal to simulate \(n\) observations from the \(N(\theta,1,\alpha)\), i.e. when \(\sigma=1\).
set.seed(2335)
#alpha = -2
#theta = 0
#n = 20
#xobs = sim.skewnormal(n,alpha,theta)
# Azzalini's data - SN(0,1,5)
# install.packages("sn")
data(frontier, package="sn")
xobs = frontier
n = length(xobs)
In what follows, we will assume that \(\alpha\) is fixed and will entertain posterior inference based on various values of this skewness quantity. Also, we assume a fairly vague prior for \(\theta\), i.e. \(\theta \sim N(0,1)\).
The columns of the matrix theta.post (below) contains draws from \(p(\theta|x_{obs},\alpha)\) for values of \(\alpha\) in \(\{0,1,\ldots,20\}\). More explicitly, we are simultaneously running 21 competing models indexed by \(\alpha\). We use the sampling importance resampling (SIR) algorithm to obtain \(M\) draws from the posterior based on \(M\) draws from the prior of \(\theta\), \(N(0,1)\).
set.seed(12345)
alphas = seq(0,19,by=1)
nalpha = length(alphas)
deltas = alphas/sqrt(1+alphas^2)
M = 100000
theta.draw = rnorm(M,0,1)
w = matrix(0,nalpha,M)
for (j in 1:nalpha)
for (i in 1:M)
w[j,i] = prod(f1(xobs,theta.draw[i],alphas[j]))
logpred = log(apply(w,1,sum))
theta.post = matrix(0,nalpha,M)
for (j in 1:nalpha)
theta.post[j,] = theta.draw[sample(1:M,size=M,replace=TRUE,prob=w[j,])]
par(mfrow=c(5,4),mai=c(0.45,0.45,0.2,0.1))
for (j in 1:nalpha){
hist(theta.post[j,],prob=TRUE,xlab="",main="",ylab="",xlim=range(theta.post))
box()
title(paste("Log predictive = ",round(logpred[j],2),sep=""))
if (alphas[j]<=0){
legend("topleft",legend=paste("alpha=",alphas[j],sep=""),bty="n")
}else{
legend("topright",legend=paste("alpha=",alphas[j],sep=""),bty="n")
}
mtext(side=1,line=2,expression(theta))
mtext(side=2,line=2,"Density",cex=1)
}
We compute the posterior distribution of \(E(x|\theta,\alpha) \equiv g(\theta,\alpha)\), the posterior model probabilities \(Pr(\alpha|x_{obs})\) and the posterior distribution of \(E(x)\), averaged over \(\alpha\) (Bayesian model averaging). We use the prior draws of \(\theta\), \(\{\theta^{(1)},\ldots,\theta^{(M)}\}\), to approximate the predictive \[ p(x_{obs}|\alpha) = \int_{-\infty}^{\infty} p(x_{obs}|\alpha,\theta)p(\theta)d\theta \] by \[ {\hat p}(x_{obs}|\alpha) = \frac{1}{M} \sum_{i=1}^M p(x_{obs}|\alpha,\theta^{(i)}). \]
postmean = apply(theta.post,1,mean)+deltas*sqrt(2/pi)
ProbM = exp(logpred)
ProbM = ProbM/sum(ProbM)
upostmean = sum(postmean*ProbM)
L = min(theta.post)
U = max(theta.post)
par(mfrow=c(1,1),mai=rep(0.8,4))
boxplot(t(theta.post),outline=FALSE,ylim=c(L,U),names=alphas,xlab="alpha",
ylab="Posterior of theta")
legend("right",legend=paste("Unconditional posterior mean = ",round(upostmean,4),sep=""),bty="n")
text(nalpha/2,U,expression(paste("Posterior mean of E(X|theta) = ",theta+(alpha/sqrt(1+alpha^2))*sqrt(2/pi),sep="")),col=2)
text(1:nalpha,U-0.15,round(postmean,2))
text(nalpha/2,L+0.15,"Posterior model probability",col=2)
text(1:nalpha,L,round(ProbM,3))
par(mfrow=c(1,1),mai=rep(0.8,4))
plot(alphas,ProbM,lwd=3,type="h",xlab=expression(alpha),ylab="Posterior model probability")
We now consider \(\alpha\) a real number and incorporate both \(\theta\) and \(\alpha\) into our posterior analysis. We maintain the prior of \(\theta\), \(N(0,1)\), and add the prior for \(\alpha \sim t_2(0,1/2)\), a Student’s \(t\) distribution with location at zero, scale at \(1/2\) and 2 degrees of freedom. This prior is induced by the the prior on \(\delta \sim U(-1,1)\).
set.seed(12345)
M = 1000000
theta.draw = rnorm(M,0,1)
alpha.draw = (1/2)*rt(M,df=2) # From delta ~ Unif(-1,1)
w = rep(0,M)
for (i in 1:M)
w[i] = prod(f1(xobs,theta.draw[i],alpha.draw[i]))
ind = sample(1:M,size=M,replace=TRUE,prob=w)
theta.post = theta.draw[ind]
alpha.post = alpha.draw[ind]
postmean = theta.post+alpha.post/sqrt(1+alpha.post^2)*sqrt(2/pi)
par(mfrow=c(2,2))
plot(theta.post[alpha.post<40],alpha.post[alpha.post<40],xlab=expression(theta),ylab=expression(alpha),
main="Posterior draws")
hist(alpha.post,xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(alpha.post[alpha.post<20],xlab="",main=expression(alpha),ylab="Posterior density",prob=TRUE);box()
hist(postmean,main=expression(theta+(alpha/sqrt(1+alpha^2))*sqrt(2/pi)),xlab="",ylab="Posterior density",prob=TRUE);box()
xbar = mean(xobs)
se = sqrt(var(xobs)/length(xobs))
abline(v=xbar,col=4,lwd=2)
abline(v=xbar-2*se,col=4,lty=2,lwd=2)
abline(v=xbar+2*se,col=4,lty=2,lwd=2)
Azzalini (1985) A class of distributions which include the normal ones. Scandinavian Journal of Statistics, 12, 171-178.
Dalla Valle (2004) The skew-normal distribution. In Skew- Elliptical Distributions and Their Applications : A Journey Beyond Normality, editor Mark Genton, Chapman and Hall/CRC, Boca Raton.
Bayes and Branco (2007) Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Brazilian Journal of Probability and Statistics, 21(2), 141-163. http://www.jstor.org/stable/43601095