We will assume that the tropical tuna percentage (TTP) in longliner catches is a function of the sea surface temperature (SST). Here \(y=\)TTP is a number between 0 and 1, a percentage, while, \(x=\)SST is a real and positive number.
Usually we say that \(y \sim Beta(\alpha,\beta)\) when \[ p(y|\alpha,\beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} y^{\alpha-1} (1-y)^{\beta-1}, \qquad 0<y<1 \ \ \ \alpha,\beta>0, \] where \(\Gamma(\cdot)\) is the gamma function \[ \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt, \] with \(\Gamma(z)=(z-1)!\) when \(z\) is a positive integer.
It is easy to verify that
\(E(y|\alpha,\beta) = \frac{\alpha}{\alpha+\beta}\)
\(V(y|\alpha,\beta) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\)
Mode\((y|\alpha,\beta) = \frac{\alpha-1}{\alpha+\beta-2}, \qquad \alpha,\beta>1\)
For modeling proportions that depend on covariates, it is suggested to use the following parametrization of the Beta distribution: \[ \beta|\mu,\phi \sim Beta(\mu,\phi), \] where \(\mu=\alpha/(\alpha+\beta)\) and \(\phi=\alpha+\beta\). The inverse transformation is given by \(\alpha=\mu \phi\) and \(\beta=(1-\mu)\phi\). This transformation leads to
\(E(y|\mu,\phi) = \mu\)
\(V(y|\mu,\phi) = \frac{\mu(1-\mu)}{1+\phi}\)
Mode\((y|\mu,\phi) = \frac{\mu\phi-1}{\phi-2}, \qquad \phi>2, \mu>1/\phi\)
We are now apt to link the response \(y\) to the regression \(x\) via \(\mu\), ie.
\[ y_i|\mu_i,\phi \sim Beta(\mu_i,\phi) \ \ \ \mbox{and} \ \ \ \log\left(\frac{\mu_i}{1-\mu_i}\right)= \theta_0 + \theta_1 x_i, \] for \(\theta_0,\theta_1 \in \Re\).
Here are two key references to understand the Beta regression model:
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815.
Francisco Cribari-Neto & Achim Zelleis, Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. 2010. https://www.jstatsoft.org/article/view/v034i02
#install.packages("rstan")
#install.packages("rstanarm")
#install.packages("statmod")
library(rstan)## Loading required package: StanHeaders
##
## rstan version 2.32.7 (Stan version 2.32.2)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## Loading required package: Rcpp
## This is rstanarm version 2.32.2
## - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
## - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
## - For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores())
##
## Attaching package: 'rstanarm'
## The following object is masked from 'package:rstan':
##
## loo
atlantic = read.table("https://hedibert.org/wp-content/uploads/2026/02/atlantic.txt",header=TRUE)
atla = data.frame(atlantic)
ind = atla[,1]==1981
x = atla[ind,6]
y = atla[ind,4]/100
y[y==0] = 0.000255362
par(mfrow=c(1,1))
plot(x,y,xlab="Sea surface temperature",ylab="Tropical tuna percentage",ylim=c(0,0.4))##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.00028 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.8 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.172 seconds (Warm-up)
## Chain 1: 0.218 seconds (Sampling)
## Chain 1: 0.39 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 2.4e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.16 seconds (Warm-up)
## Chain 2: 0.216 seconds (Sampling)
## Chain 2: 0.376 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 2.5e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.19 seconds (Warm-up)
## Chain 3: 0.207 seconds (Sampling)
## Chain 3: 0.397 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 2.3e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.23 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.17 seconds (Warm-up)
## Chain 4: 0.22 seconds (Sampling)
## Chain 4: 0.39 seconds (Total)
## Chain 4:
draws = as.matrix(fit1)
xx = seq(min(x),max(x),length=100)
fitted1 = 1/(1+exp(-(fit1$coef[1]+fit1$coef[2]*xx)))
param = c("intercept","slope","precision")
par(mfrow=c(2,3))
for (i in 1:3) ts.plot(draws[,i],xlab="Iterations",ylab="",main=param[i])
for (i in 1:3) hist(draws[,i],prob=TRUE,main="",xlab="")fits = matrix(0,4000,100)
draws1 = matrix(0,4000,100)
for (i in 1:100){
fits[,i] = 1/(1+exp(-(draws[,1]+draws[,2]*xx[i])))
draws1[,i] = rbeta(4000,fits[,i]*draws[,3],(1-fits[,i])*draws[,3])
}
qfits = t(apply(fits,2,quantile,c(0.05,0.5,0.95)))
qfits1 = t(apply(draws1,2,quantile,c(0.05,0.5,0.95)))
par(mfrow=c(1,1))
plot(x,y,xlab="Sea surface temperature",ylab="Tropical tuna percentage",
ylim=c(0,1))
lines(xx,fitted1,col=2,lwd=2)
lines(xx,qfits[,1],col=2,lwd=2,lty=2)
lines(xx,qfits[,3],col=2,lwd=2,lty=2)
lines(xx,qfits1[,1],col=3,lwd=2,lty=2)
lines(xx,qfits1[,3],col=3,lwd=2,lty=2)Here we allow both \(\mu_i\) and \(\phi_i\) to be functions of \(x_i\):
\[ \log\left(\frac{\mu_i}{1-\mu_i}\right)= \theta_0 + \theta_1 x_i \qquad\qquad \mbox{and} \qquad\qquad \log \phi_i = \gamma_0 + \gamma_1 x_i. \]
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000254 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.54 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.372 seconds (Warm-up)
## Chain 1: 0.315 seconds (Sampling)
## Chain 1: 0.687 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 3.5e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.36 seconds (Warm-up)
## Chain 2: 0.324 seconds (Sampling)
## Chain 2: 0.684 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 3.5e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.391 seconds (Warm-up)
## Chain 3: 0.323 seconds (Sampling)
## Chain 3: 0.714 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 3.2e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.351 seconds (Warm-up)
## Chain 4: 0.355 seconds (Sampling)
## Chain 4: 0.706 seconds (Total)
## Chain 4:
draws2 = as.matrix(fit2)
fitted2 = 1/(1+exp(-(fit2$coef[1]+fit2$coef[2]*xx)))
param = c("intercept","slope","int.precision","slope.precision")
par(mfrow=c(2,4))
for (i in 1:4) ts.plot(draws2[,i],xlab="Iterations",ylab="",main=param[i])
for (i in 1:4) hist(draws2[,i],prob=TRUE,main="",xlab="")draws21 = matrix(0,4000,100)
for (i in 1:100){
mu.fit = 1/(1+exp(-(draws2[,1]+draws2[,2]*xx[i])))
phi.fit = exp(draws2[,3]+draws2[,4]*xx[i])
draws21[,i] = rbeta(4000,mu.fit*phi.fit,(1-mu.fit)*phi.fit)
}
qfits2 = t(apply(draws21,2,quantile,c(0.05,0.5,0.95)))
par(mfrow=c(1,1))
plot(x,y,xlab="Sea surface temperature",ylab="Tropical tuna percentage",ylim=c(0,1))
lines(xx,fitted1,col=2,lwd=2)
lines(xx,qfits1[,1],col=2,lwd=2,lty=2)
lines(xx,qfits1[,3],col=2,lwd=2,lty=2)
lines(xx,fitted2,col=4,lwd=2)
lines(xx,qfits2[,1],col=4,lwd=2,lty=2)
lines(xx,qfits2[,3],col=4,lwd=2,lty=2)
legend("topleft",legend=c("Model 1: mu(x)","Model 2: mu(x),phi(x)"),col=c(2,4),lty=1,lwd=2,bty="n")