Introduction

This is simply a compilation with links to stan, rstan and related packages to perform Bayesian inference with Hamiltonian Monte Carlo (HMC) methods. I organized it for my own use, but feel free to dig in.

Stan

The following is taken from Stan main page (https://mc-stan.org):

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business. Users specify log density functions in Stan’s probabilistic programming language and get: i) full Bayesian statistical inference with MCMC sampling (NUTS, HMC), ii) approximate Bayesian inference with variational inference (ADVI), and iii) penalized maximum likelihood estimation with optimization (L-BFGS). Stan’s math library provides differentiable probability functions & linear algebra (C++ autodiff). Additional R packages provide expression-based linear modeling, posterior visualization, and leave-one-out cross-validation.

Stan User’s Guide

The Stan user’s guide provides example models and programming techniques for coding statistical models in Stan. It also serves as an example-driven introduction to Bayesian modeling and inference.

Marco Inacio (USP/UFSCar) wrote the Stan manual in Portuguese.

Rstan - The R interface to Stan

It is distributed on CRAN as the rstan package and its source code is hosted on GitHub. Before installation, make sure you have the necessary C++ toolchain for your system by following the instructions in the Getting Started documents.

Visit the RStan website at mc-stan.org/rstan for vignettes/tutorials, function documentation, and other information about the package.

There are plenty of examples Stan User’s guide with various degrees of complexity, ranging from regression models (linear, logistic, probit, multi-logit, ordered logistic, hierarchical logistic and IRT regression models), time-series models (AR, MA, stochastic volatility and hidden Markov models), finite mixture models, clustering models and Gaussian processes, amongst many others.

Below, I list six examples that I’ve run in my own computer. The first two examples I copied from Ben Lambert’s page and tutorial1 and tutorial2. Example 3 is taken from Marco Inacio’s Portuguese manual. I wrote the banana-shape, AR(1)+noise and SV-AR(1) examples. The stan/rstan part of the SV-AR(1) was inspired by example in the Stan’s User’s guide sv example.

Example 1: Simple iid Gaussian model

\[\begin{eqnarray*} y_i|\mu,\sigma^2 &\sim& N(\mu,\sigma^2)\\ p(\sigma) &=& \frac{2}{\pi(1+\sigma)}\\ \mu &\sim& N(1.7,0.3^2). \end{eqnarray*}\] We simulate \(y_1,\ldots,y_{100}\) from \(N(1.6,0.2^2)\).

Example1.stan

data {
  int n;
  real y[n];
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  for (i in 1:n)
   y[i] ~ normal(mu,sigma);
  mu ~ normal(1.7,0.3);
  sigma ~ cauchy(0,1);
}

Example1.R

install.packages("rstan")
library("rstan")
options(mc.cores=4)

# Simulating some data
n     = 100
y     = rnorm(n,1.6,0.2)

# Running stan code
model = stan_model("Example1.stan")

fit = sampling(model,list(n=n,y=y),iter=200,chains=4)

print(fit)

params = extract(fit)

par(mfrow=c(1,2))
ts.plot(params$mu,xlab="Iterations",ylab="mu")
hist(params$sigma,main="",xlab="sigma")

Example 2: Simple Gaussian linear model

\[\begin{eqnarray*} y_i|x_i,\beta_0,\beta_1,\sigma^2 &\sim& N(\beta_0+\beta_1 x_i,\sigma^2)\\ (\alpha,\beta) &\sim& N(0,I_2)\\ \sigma &\sim& \mbox{Half-Cauchy}(0,2.5) \end{eqnarray*}\]

Example2.stan

data {
  int<lower=1> n;
  real x[n];
  real y[n];
}

parameters {
  real beta;
  real alpha;
  real<lower=0> sigma;
}

model {
  alpha ~ normal(0,1);
  beta  ~ normal(0,1);
  sigma ~ cauchy(0,2.5);
  for (i in 1:n)
   y[i] ~ normal(alpha+beta*x[i],sigma);
}

Example2.R

install.packages("rstan")
library("rstan")
options(mc.cores=4)

# Simulating some data
n = 100
x = rnorm(n)
y = rnorm(n,1+2*x,1)

# Running stan code
model = stan_model("Example2.stan")

fit = sampling(model,list(n=n,y=y,x=x),iter=200,chains=4)

print(fit)

params = extract(fit)

par(mfrow=c(2,3))
ts.plot(params$alpha,xlab="Iterations",ylab="alpha")
ts.plot(params$beta,xlab="Iterations",ylab="beta")
ts.plot(params$sigma,xlab="Iterations",ylab="sigma")
hist(params$alpha,main="",xlab="alpha")
hist(params$beta,main="",xlab="beta")
hist(params$sigma,main="",xlab="sigma")

Example 3: Simple Gaussian random-effects model

\[\begin{eqnarray*} y_j|\theta_j,\sigma^2 &\sim& N(\theta_j,\sigma^2_j)\\ \theta_j &\sim& N(\mu,\tau^2), \ j=1,\ldots,n, \end{eqnarray*}\] In this example \(\sigma_j\) are all known.

Example3.stan

data {
  int<lower=0> J;         // number of schools 
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates 
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}

Example3.R

install.packages("rstan")
library("rstan")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

# Reading the data
rstan:::rstudio_stanc("8schools.stan")

# Running stan code
schools_dat=list(J=8,y=c(28,8,-3,7,-1,1,18,12),sigma=c(15,10,16,11,9,11,10,18))

fit <- stan(file='Example4.stan',data=schools_dat)

print(fit)

plot(fit)

pairs(fit, pars = c("mu", "tau", "lp__"))
la <- extract(fit, permuted = TRUE) # return a list of arrays 
mu <- la$mu
a  <- extract(fit, permuted = FALSE) 
a2 <- as.array(fit)
m  <- as.matrix(fit)
d  <- as.data.frame(fit)

Example 4: Banana-shape posterior

\[ p(\theta_1,\theta_2|y_{1:n}) \propto p_N(\theta_1;b_1,\tau^2) p_N(\theta_2;b_2,\tau^2) \prod_{i=1}^n p_N(y_i;\theta_1+\theta_2^2,\sigma^2), \] for \(n=20\), \(b_0=0\), \(b_1=0.5\) and \(\sigma=\tau=1\). We simulate \(y_1,\ldots,y_n\) from \(N(1,1)\).

Example4.stan

data {
  int n;
  real y[n];
}

parameters {
  real theta1;
  real theta2;
}

model {
  for (i in 1:n)
   y[i] ~ normal(theta1+theta2^2,1);
  theta1 ~ normal(0,1);
  theta2 ~ normal(0.5,1);
}

Example4.R

install.packages("rstan")
library("rstan")
options(mc.cores=4)

# Simulating some data
n      = 20
sigy   = 1
sigth  = 1
th10   = 0
th20   = 0.5
y      = rnorm(n,1,sigy)
sigy2  = sigy^2
sigth2 = sigth^2

# Running stan code
model = stan_model("Example4.stan")

fit = sampling(model,list(n=n,y=y),iter=40000,chains=1)

print(fit)
params = extract(fit)

draw.stan = cbind(params$theta1,params$theta2)

par(mfrow=c(2,3))
ts.plot(draw.stan[,1],main="Stan(NUTS)",ylab="",xlab=expression(theta[1]))
acf(draw.stan[,1],main=expression(theta[1]))
plot(density(draw.stan[,1]),type="l",main="",xlab=expression(theta[1]),lwd=2,ylab="Density")
ts.plot(draw.stan[,2],main="",ylab="",xlab=expression(theta[2]))
acf(draw.stan[,2],main=expression(theta[2]))
plot(density(draw.stan[,2]),type="l",main="",xlab=expression(theta[2]),lwd=2,ylab="Density")

Example 5: AR(1) plus noise model

\[\begin{eqnarray*} y_t &=& x_t+\sigma\varepsilon_t\\ x_t &=& \mu+\phi x_{t-1}+\tau u_t, \end{eqnarray*}\] where \(\varepsilon_t\) and \(u_t\) are iid \(N(0,1)\). The independent priors used are: \(\mu\) and \(\phi\) are \(N(0,1)\) and \(\tau\) and \(\sigma\) are \(Gamma(1,1)\).

Example5.stan

data {
  int <lower=0> n;
  real x[n];
  real y[n];
}

parameters {
  real u_err[n];
  real <lower=0> sig;
  real <lower=0> tau;
  real<lower=0, upper=1> phi;
  real mu;
  real d;
}

transformed parameters{
  real u[n]; //Level
  real s[n];
  real z[n];
  u[1] = u_err[1];
  s[1] = 1;
  z[1] = u[1];
  for (t in 2:n) {
    u[t] = mu+phi*u[t-1] + tau*u_err[t];
    if (fabs(u[t])>=d){
      s[t] = 1;
    }else{
      s[t]=0;
    }
    z[t] = u[t]*s[t];
  }
}

model{
  u_err ~ normal(0,1);
  y     ~ normal(z,sig);
  mu    ~ normal(0,1);
  phi   ~ normal(0,1);
  tau   ~ gamma(1,1);
  sig   ~ gamma(1,1);
  d ~ uniform(0,fabs(mu/(1-phi))+3*tau/sqrt(1-phi^2));
}

Example5.R

install.packages("rstan")
library("rstan")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

# simulating some data
n   = 200
mu  = 0.05
phi = 0.95
sig = 1.0
tau = 0.7
d   = 2.5
mean = mu/(1-phi) 
sd   = tau/sqrt(1-phi^2)
x   = rnorm(1 ,mu/(1-phi),tau/sqrt(1-phi^2))
for(t in 2:n)
  x[t] = rnorm(1,mu+phi*x[t-1],tau)
z = x
z[abs(z)<d]=0
y  = rnorm(n,z,sig)

# Running stan code
data <- list(n=n,x=x,y=y)

fit <- stan(file='ex5-ar1plusnoise.stan',data=data)

la <- extract(fit, permuted = TRUE) # return a list of arrays 
acfs = matrix(0,51,n)
for (t in 1:n)
  acfs[,t] = acf(la$u[,t],lag.max=50,plot=FALSE)$acf
qx = t(apply(la$u,2,quantile,c(0.05,0.5,0.95)))
qz = t(apply(la$z,2,quantile,c(0.05,0.5,0.95)))

par(mfrow=c(2,2))
ts.plot(la$sig,xlab="iteration",ylab="",main=expression(sigma))
abline(h=sig,col=2)
ts.plot(la$mu,xlab="iteration",ylab="",main=expression(mu))
abline(h=mu,col=2)
ts.plot(la$phi,xlab="iteration",ylab="",main=expression(phi))
abline(h=phi,col=2)
ts.plot(la$tau,xlab="iteration",ylab="",main=expression(tau))
abline(h=tau,col=2)

par(mfrow=c(2,3))
acf(la$sig,main=expression(sigma))
acf(la$mu,main=expression(mu))
acf(la$phi,main=expression(phi))
acf(la$tau,main=expression(tau))
ts.plot(acfs,xlab="Lag",ylab="ACF",main=expression(x[t]))

par(mfrow=c(2,2))
hist(la$sig,prob=T,xlab="",main=expression(sigma))
abline(v=sig,col=2)
hist(la$mu,prob=T,xlab="",main=expression(mu))
abline(v=mu,col=2)
hist(la$phi,prob=T,xlab="",main=expression(phi))
abline(v=phi,col=2)
hist(la$tau,prob=T,xlab="",main=expression(tau))
abline(v=tau,col=2)

par(mfrow=c(1,1))
ts.plot(qx,ylab="states")
lines(qz[,1],col=2)
lines(qz[,2],col=2)
lines(qz[,3],col=2)
lines(x,col=3)
legend("topright",legend=c("Posterior quantiles",expression(x[t])),col=1:2,lty=1)

Example 6: SV-AR(1) model

For \(t=1,\ldots,n\), \[\begin{eqnarray*} y_t &=& \exp\{h_t/2\}\varepsilon_t\\ h_t &=& \mu+\phi (h_{t-1} - \mu)+\sigma u_t,\\ h_1 &\sim& N(\mu,\sigma^2/(1-\phi^2)). \end{eqnarray*}\] where \(\varepsilon_t\) are iid \(t_\nu(0,1)\) and \(u_t\) are iid \(N(0,1)\). The independent priors used are: \(\mu \sim N(0,10^2)\), \(\phi \sim N_{(-1,1)}(0.8,0.3^2)\), \(\sigma^2 \sim Gamma(0.5,0.5)\) and \(\nu \sim Gamma(1,0.05)\).

Example6.stan

data {
  int<lower=0> n;   // # time points (equally spaced)
  vector[n] y;      // mean corrected return at time t
}

parameters {
  real mu;                     // mean log volatility
  real<lower=-1,upper=1> phi;  // persistence of volatility
  real<lower=0> sigma;         // white noise shock scale
  real<lower=2> nu;            // degrees of freedom
  vector[n] h;                 // log volatility at time t
}

model {
  phi ~ normal(0.8,0.3);
  sigma ~ gamma(0.5,0.5);
  mu ~ normal(0,10);
  nu ~ gamma(1,0.05);
  h[1] ~ normal(mu,sigma/sqrt(1-phi*phi));
  for (t in 2:n)
    h[t] ~ normal(mu+phi*(h[t-1]-mu),sigma);
  y ~ student_t(nu, 0, exp(h/2));
}

Example6.R

install.packages("rstan")
install.packages("astsa")
library("rstan")
library("astsa")
options(mc.cores=4)

# Daily closing prices of Germany DAX
price = EuStockMarkets[,4]
y     = diff(log(price))
y     = y - mean(y)
n     = length(y)

par(mfrow=c(1,1))
ts.plot(y,ylab="",main="")

# Running stan code
M  = 2000
M0 = M/2

model  = stan_model("Example6.stan")

fit.t    = sampling(model.t,list(n=n,y=y),iter=M,chains=1)

params = extract(fit.t)

corr = matrix(sqrt(params.t$nu/(params.t$nu-2)),M/2,n)

qstdev.t = t(apply(exp(params.t$h/2)*corr,2,quantile,c(0.025,0.5,0.975)))

par(mfrow=c(2,3))
ts.plot(params$mu,xlab="Draws",ylab=expression(mu),main="Student's t")
ts.plot(params$phi,xlab="Draws",ylab=expression(phi))
ts.plot(params$sigma,xlab="Draws",ylab=expression(sigma))
hist(params.t$mu,prob=TRUE,xlab=expression(mu),main="Student's t model")
hist(params.t$phi,prob=TRUE,xlab=expression(phi),main="")
hist(params.t$sigma,prob=TRUE,xlab=expression(sigma),main="")

par(mfrow=c(1,2))
ts.plot(params.t$nu,xlab="Iterations",ylab="",main=expression(nu))
hist(params.t$nu,xlab="",prob=TRUE,main="")

#Posterior distribution of time-varying standard deviations
par(mfrow=c(1,1))
ts.plot(qstdev,col=c(3,2,3),ylim=c(0,0.2))
lines(0.1*abs(y),type="h")

Other R packages that implement HMC

hmclearn

The R package hmclearn was written by Samuel Thomas and Wanzhu Tu. See their 2021 article on the American Statistician entitle Learning Hamiltonian Monte Carlo in R. They’ve also prepared vignettes at linear, logistic and Poisson regression models Linear and logistic mixed effects models.

I reproduced here bits of their paper that I find interesting:

Probabilistic programming language

The goal of the Stan project is to provide a flexible probabilistic programming language for statistical modeling along with a suite of inference tools for fitting models that are robust, scalable, and efficient.

Stan versus BUGS/JAGS Imperative probabilistic programming language declarative graphical modeling languages

Stan differs from BUGS and JAGS in two primary ways. First, Stan is based on a new imperative probabilistic programming language that is more flexible and expressive than the declarative graphical modeling languages underlying BUGS or JAGS, in ways such as declaring variables with types and supporting local variables and conditional statements. Second, Stan’s Markov chain Monte Carlo (MCMC) techniques are based on Hamiltonian Monte Carlo (HMC), a more efficient and robust sampler than Gibbs sampling or Metropolis-Hastings for models with complex posteriors.

Theoretical results about HMC Neal (2011) analyzes the scaling benefit of HMC with dimensionality. Hoffman and Gelman (2014) provide practical comparisons of Stan’s adaptive HMC algorithm with Gibbs, Metropolis, and standard HMC samplers.

Stan interfaces with R, Python, MATLAB, Julia, Stata and Mathematica Stan has the interfaces cmdstan for the command line shell, pystan for Python (Van Rossum et al. 2016), and rstan for R (R Core Team 2016). Stan also provides packages wrapping cmdstan, including MatlabStan for MATLAB, Stan.jl for Julia, StataStan for Stata, and MathematicaStan for Mathematica. These interfaces run on Windows, Mac OS X, and Linux, and are open-source licensed.

rstanarm

https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html Introduction to multilevel modeling using rstanarm: A tutorial for education researchers JoonHo Lee (), Nicholas Sim, Feng Ji, and Sophia Rabe-Hesketh Monday, April 24, 2018

Package bayesplot

Thomas andTu (2021) say the following in their 2021 article Learning Hamiltonian Monte Carlo in R

If you are just getting started with bayesplot we recommend starting with the tutorial vignettes, the examples throughout the package documentation, and the paper Visualization in Bayesian workflow: Gabry et al. (2019). Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402.

Hamiltonian Monte Carlo (HMC) literature

Leimkuhler, Benedict, and Sebastian Reich. 2004. Simulating Hamiltonian Dynamics. Cambridge: Cambridge University Press.

Girolami, Mark, and Ben Calderhead. 2011. “Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123–214.

Neal, Radford. 2011. “MCMC Using Hamiltonian Dynamics.” In Handbook of Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng, 116–62. Chapman; Hall/CRC.

Betancourt, Michael, and Mark Girolami. 2013. “Hamiltonian Monte Carlo for Hierarchical Models.” arXiv 1312.0906. http://arxiv.org/abs/1312.0906.

Betancourt, M. (2017), ‘A Conceptual Introduction to Hamiltonian Monte Carlo’, arXiv preprint arXiv:1701.02434 pp. 1–60. URL:http://arxiv.org/abs/1701.02434

Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15: 1593–1623. http://jmlr.org/papers/v15/hoffman14a.html.

Betancourt, Michael, and Leo C. Stein. 2011. “The Geometry of Hamiltonian Monte Carlo.” arXiv 1112.4118. http://arxiv.org/abs/1112.4118.

Motohide Nishio & Aisaku Arakawa (2019) Performance of Hamiltonian Monte Carlo and No-U-Turn Sampler for estimating genetic parameters and breeding values. Genetics Selection Evolution, 51:73.

Jonah Gabry Daniel Simpson Aki Vehtari Michael Betancourt Andrew Gelman (2019) Visualization in Bayesian workflow, JRSS-A, Volume182, Issue2, February 2019, Pages 389-402. https://doi.org/10.1111/rssa.12378.

Samuel Thomas and Wanzhu Tu (2021) Learning Hamiltonian Monte Carlo in R, The American Statistician.