1 Data structure

The following data is taken from example 11 from Bolstad (2010) Understanding Computational Bayesian Statistics, Wiley Series in Computational Statistics. The data is on the age and coronary heart disease status for \(n=100\) subjects taken from Hosmer and Lemeshow (1989) # Applied Logistic Regression, John Wiley and Sons.

2 Bernoulli model

In this model, the 0/1 or Yes/no data \({\cal D}=\{y_1,\ldots,y_n\}\) are modeled as conditionally independent and identically distributed (iid) Bernoulli(), i.e. \[ p(y_i|\theta) = \theta^{y_i}(1-\theta)^{1-y_i}, \qquad y_i \in \{0,1\}, \theta \in (0,1). \] Therefore, the number of observed 1’s, \(s_n=\sum_{i=1}^n y_i\), follows, conditionally on \(\theta\), a Binomial distribution, i.e. \[ p(s_n|\theta) = \frac{n!}{s_n!(n-s_n!)}\theta^{s_n}(1-\theta)^{n-s_n}, \] for \(s_n \in \{0,1,\ldots,n\}\). The likelihood function is, then, \[ L(\theta|s_n) \propto \theta^{s_n}(1-\theta)^{n-s_n}, \] and it is easy to see that the maximum likelihood estimator of \(\theta\) is \[ {\hat \theta}_{mle}=\frac{s_n}{n}. \]

## [1] 43.00  0.43

2.1 Bayesian inference

Bayesian inference is complete when a prior distribution for \(\theta\) is set up. Let us use the conjugate prior for \(\theta\), a \(Beta(a_0,b_0)\), where \[ p(\theta) = \frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1}, \qquad \theta \in (0,1), a_0,b_0>0, \] with prior mean and variance equal to \[ E(\theta) = \frac{a_0}{a_0+b_0} \ \ \mbox{and} \ \ V(\theta) = \frac{a_0b_0}{(a_0+b_0)^2(a_0+b_0+1)}. \]

##                      Mean St.Dev.    5%   50%   95%
## Prior summaries     0.200   0.087 0.075 0.190 0.359
## Posterior summaries 0.392   0.044 0.320 0.391 0.466

2.2 Prior predictive of \(s_n=\sum_{i=1}^n y_i\) given \((n,a_0,b_0)\)

2.3 Posterior predictive of \(y_{n+1}\) given \((a_1,b_1)\)

It can be shown that \[ Pr(y_{n+1}=1|s_n,a_0,b_0) = \frac{a_0+s_n}{a_0+b_0+n} = \alpha+\beta {\bar y}_n, \] where \(\alpha=\alpha_0/(a_0+b_0+n)\) and \(\beta=n/(a_0+b_0+n)\) and \({\bar y}_n = \frac{s_n}{n}\).

3 Logit and probit regression models

In the simplest forms, the observation \(y_i\) (coronary heart disease status) is related to the covariate \(x_i\) (age) via either a logit or probit link function: \[\begin{eqnarray*} \theta_i &=& \frac{1}{1+e^{-(\beta_0+\beta_1 x_i)}}\\ \theta_i &=& \Phi(\beta_0+\beta_1 x_i), \end{eqnarray*}\] respectively.

3.1 Likelihood function of \((\beta_1,\beta_2)\)

Given the data \({\cal D}_n=\{(x_1,y_1),\ldots,(x_n,y_n)_\}\), the likelihood function of \((\beta_0,\beta_1)\) is given by \[ L(\beta_0,\beta_1|{\cal D}_n) = \prod_{i=1}^n \theta_i^{y_i} (1-\theta_i)^{1-y_i}. \]

## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value

## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value

## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value
## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value

## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value

## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value

## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value
## [1] "betas MLE (logit) : (-5.3095,0.1109)"
## [1] "betas MLE (probit) : (-3.1457,0.0658)"

3.2 Fitted logit and probit regressions

3.3 Bayesian inference via SIR

Below we assume a simple bivariate uniform distribution for \((\beta_0,\beta_1)\) in the rectangle \({\cal A}=\{(-10,-1.5) \times (0.01,0.2)\}\) and use sampling importance resampling (SIR) to convert prior draws into resampled posterior draws.

4 Model comparison

4.1 Model comparison via posterior predictive summaries

4.2 Computing Bayes factors and posterior model probabilities

## [1] "Prior predictive M1 = 2.20379574939852e-33"
## [1] "Prior predictive M2 logit = 1.08336812307888e-25"
## [1] "Prior predictive M2 probit = 3.33156263389443e-26"
## [1] "Bayes factor M1 vs M2 logit = 0"
## [1] "Bayes factor M2 probit vs M2 logit = 0.308"
## [1] "Posterior model probability for M1 = 0"
## [1] "Posterior model probability for M2 logit = 0.765"
## [1] "Posterior model probability for M1 probit = 0.235"