Finite Mixture of Poissons model

We assume that the observations (counts) \(y_1,\ldots,y_n\) are i.i.d. from a finite mixture of \(k\) Poisson distributions, i.e. \[ y_i|\omega,\lambda \sim \sum_{j=1}^k \omega_j \mbox{Poisson}(\lambda_j), \] with weights \(\omega_i \in (0,1)\), \(\sum_{j=1}^k \omega_j=1\) and rates \(\lambda_j>0\), for \(j=1,\ldots,k\). The model is completed with conditionally conjugate priors for \(\omega\) and the \(\lambda_j\)s: \[\begin{eqnarray*} p(\omega,\lambda) &=& p(\omega)\prod_{j=1}^k p(\lambda_j)\\ \lambda_j &\sim& \mbox{Gamma}(a,b) \qquad j=1,\ldots,k\\ \omega &\sim& \mbox{Dirichlet}(c), \end{eqnarray*}\]

where hyperparameters \(a\), \(b\) and \(c=(c_1,\ldots,c_k)\) are all positive numbers.

Data augmentation

The likelihood function for the finite mixture of Poisson distributions (or any other finite mixture of distributions) are intractable because of the product of weighted averages of densities: \[ L(\omega,\lambda;y_{1:n}) = \prod_{i=1}^n \sum_{j=1}^k \omega_j \frac{\lambda_j^{y_i}e^{-\lambda_j}}{y_i!}. \] The data augmentation argument introduces latent indicators \(z_i\), such that \(z_i \in \{1,\ldots,k\}\) indicates the mixture component (membership) for observation \(y_i\). Conditionally on set set of \(z_i\)s, the likelihood function becomes tractable: \[ L(\omega,\lambda;y_{1:n},z_{1:n}) = \prod_{j=1}^k \prod_{i:z_i=j} \frac{\lambda_j^{y_i}e^{-\lambda_j}}{y_i!}. \] with \(Pr(z_i=j) = \omega_j\) for \(i=1,\ldots,n\) and \(j=1,\ldots,k\).

MCMC algorithm

The MCMC algorithm based on the data augmentation argument becomes much simpler, at least computationall. The full conditionals are \[\begin{eqnarray*} p(\lambda_j|y_{1:n},z_{1:n}) &=& p(\lambda_j|\{y_i:z_i=j\},n_j)\\ p(\omega|y_{1:n},z_{1:n}) &=& p(\omega|z_{1:n})\\ p(z_{1:n}|y_{1:n},\omega,\lambda) &=& \prod_{i=1}^n p(z_i|y_i,\omega,\lambda) \end{eqnarray*}\]

, with \(n_j=\sum_{i=1}^n \delta_j(z_i)\) and \(\delta_j(z_i)=1\) if \(z_i=j\) and zero otherwise, for $j=1,,k.

As the above full conditionals reveal, \(\omega\) and \(\lambda\) are conditionally independent given the latent vector \(z_{1:n}\). More explictly, \[ \omega|y_{1:n},z_{1:n} \sim \mbox{Dirichlet}(c+n) \ \ \ \mbox{and} \ \ \ \lambda_j|y_{1:n},z_{1:n} \sim \mbox{Gamma}(a+n_j{\bar y}_j,b+n_j) \] where \(n=(n_1,\ldots,n_k)\) are the number of observations allocated to each mixture component, conditionally on \(z_{1:n}\), and \(n_j{\bar y}_j=\sum_{i:z_i=j} y_i\), for \(j=1,\ldots,n\). Finally, the full conditional of \(z_i\) is \[ \pi_{ij} = Pr(z_i=j) = \frac{\omega_j \lambda_j^{y_i}e^{-\lambda_j}(y_i!)^{-1}}{ \sum_{l=1}^k \omega_l \lambda_l^{y_i}e^{-\lambda_l}(y_i!)^{-1} } \] so the membership \(z_i\) is sampled from \(\{1,\ldots,k\}\) with probabilties \(\{\pi_{i1},\ldots,\pi_{ik}\}\). The MCMC algorithm is implemented in the following {R} script.

MixturePoisson = function(y,k,weights,rates,a,b,c,burnin,M,thin){
  niter = burnin+M*thin
  n    = length(y)
  z    = rep(0,n)
  nz   = rep(0,k) 
  sumy = rep(0,k)
  draws = matrix(0,niter,2*k)
  for (iter in 1:niter){
    for (i in 1:n){
      prob = weights*dpois(y[i],rates)
      z[i] = sample(1:k,size=1,prob=prob)
    }
    for (j in 1:k){
      nz[j] = sum(z==j)
      sumy[j] = sum(y[z==j])
    }
    rates = rgamma(k,a+sumy,b+nz)
    aux = rgamma(k,c+nz)
    weights = aux/sum(aux)
    draws[iter,] = c(weights,rates)
  }
  return(draws[seq(burnin+1,niter,by=thin),])
}

Annual counts of major earthquakes

Below we will analyse the annual counts of major earthquakes (i.e. magnitude 7 and above) for the years 1900-2006. The source is the book Hidden Markov Models for Time Series: An Introduction Using R (2nd edition) by Walter Zucchini, Iain L. MacDonald and Roland Langrock. Section 1.2.4 (Examples of fitted mixture models: Mixtures of Poisson distributions). Page 4, Table 1.1, contains the dataset. Page 12, Table 1.2, contains parameters of the fitted mixture of poissons. Page 13, Figure 1.4, shows fitted mixture of poisson densities with 1 to 4 components.

quakes = c(13,14,8,10,16,26,32,27,18,32,36,24,22,23,22,18,25,21,21,14,8,11,14,
           23,18,17,19,20,22,19,13,26,13,14,22,24,21,22,26,21,23,24,27,41,31,
           27,35,26,28,36,39,21,17,22,17,19,15,34,10,15,22,18,15,20,15,22,19,
           16,30,27,29,23,20,16,21,21,25,16,18,15,18,14,10,15,8,15,6,11,8,7,
           18,16,13,12,13,20,15,16,12,18,15,16,13,15,16,11,11)
n = length(quakes)

y = quakes
n = length(y)
maxy = max(y)
counts = rep(0,1+maxy)
for (i in 1:(1+maxy))
  counts[i]=sum(y==(i-1))

par(mfrow=c(2,1))
plot(1900:2006,y,xlab="Year",ylab="Annual counts",type="h")

plot(0:maxy,counts,type="h",xlab="Annual counts of major earthquakes",ylab="Frequency",lwd=3)

Let us set up the prior hyperparameters and the MCMC algorithm.

a = 1
b = 1
c = 1
k = 4
rates   = c(10.5,15.5,21,32)
weights = c(0.1,0.35,0.45,0.1)
burnin = 5000
M = 5000
thin = 1
set.seed(2435)
draws = MixturePoisson(y,k,weights,rates,a,b,c,burnin,M,thin)

Let us take a look at the trace plots of the chains.

par(mfrow=c(2,k))
for (i in 1:(2*k))
  ts.plot(draws[,i])

Now, let us deal with label switching.

draws1 = draws
for (iter in 1:M){
  ord = order(draws[iter,(k+1):(2*k)])  
  draws1[iter,1:k] = draws[iter,ord]
  draws1[iter,(k+1):(2*k)] = draws[iter,k+ord]
}

par(mfrow=c(2,k))
for (i in 1:k)
  ts.plot(draws1[,i],ylim=c(0,1))
for (i in (k+1):(2*k))
  ts.plot(draws1[,i])

Finally, let us look at the posterior predictive density for a new observation \(y_{n+1}\).

yy = 0:max(y)
ny = length(yy)
deny = matrix(0,ny,M)
for (i in 1:ny)
  deny[i,] = apply(draws1[,1:k]*dpois(yy[i],draws1[,(k+1):(2*k)]),1,sum)

qdeny = t(apply(deny,1,quantile,c(0.05,0.5,0.95)))
  
par(mfrow=c(1,1))
plot(yy,deny[,1],type="l",xlab="Annual counts of major earthquakes",ylab="Relative frequency",lwd=2,ylim=range(counts/n))
for (j in seq(1,M,by=10))
  lines(yy,deny[,j],col=grey(0.9),lwd=2)
lines(yy,counts/n,lwd=3,type="h")
lines(yy,qdeny[,2],col=2,lwd=2,type="b",pch=16)
lines(yy,qdeny[,1],col=3,lwd=2,type="b",pch=16)
lines(yy,qdeny[,3],col=3,lwd=2,type="b",pch=16)