The following data is taken from example 11 from Bolstad (2010) Understanding Computational Bayesian Statistics, Wiley Series in Computational Statistics. The data is on the age and coronary heart disease status for \(n=100\) subjects taken from Hosmer and Lemeshow (1989) # Applied Logistic Regression, John Wiley and Sons.
In this model, the 0/1 or Yes/no data \({\cal D}=\{y_1,\ldots,y_n\}\) are modeled as conditionally independent and identically distributed (iid) Bernoulli(), i.e. \[ p(y_i|\theta) = \theta^{y_i}(1-\theta)^{1-y_i}, \qquad y_i \in \{0,1\}, \theta \in (0,1). \] Therefore, the number of observed 1’s, \(s_n=\sum_{i=1}^n y_i\), follows, conditionally on \(\theta\), a Binomial distribution, i.e. \[ p(s_n|\theta) = \frac{n!}{s_n!(n-s_n!)}\theta^{s_n}(1-\theta)^{n-s_n}, \] for \(s_n \in \{0,1,\ldots,n\}\). The likelihood function is, then, \[ L(\theta|s_n) \propto \theta^{s_n}(1-\theta)^{n-s_n}, \] and it is easy to see that the maximum likelihood estimator of \(\theta\) is \[ {\hat \theta}_{mle}=\frac{s_n}{n}. \]
## [1] 43.00 0.43
Bayesian inference is complete when a prior distribution for \(\theta\) is set up. Let us use the conjugate prior for \(\theta\), a \(Beta(a_0,b_0)\), where \[ p(\theta) = \frac{\Gamma(a_0+b_0)}{\Gamma(a_0)\Gamma(b_0)}\theta^{a_0-1}(1-\theta)^{b_0-1}, \qquad \theta \in (0,1), a_0,b_0>0, \] with prior mean and variance equal to \[ E(\theta) = \frac{a_0}{a_0+b_0} \ \ \mbox{and} \ \ V(\theta) = \frac{a_0b_0}{(a_0+b_0)^2(a_0+b_0+1)}. \]
## Mean St.Dev. 5% 50% 95%
## Prior summaries 0.200 0.087 0.075 0.190 0.359
## Posterior summaries 0.392 0.044 0.320 0.391 0.466
It can be shown that \[ Pr(y_{n+1}=1|s_n,a_0,b_0) = \frac{a_0+s_n}{a_0+b_0+n} = \alpha+\beta {\bar y}_n, \] where \(\alpha=\alpha_0/(a_0+b_0+n)\) and \(\beta=n/(a_0+b_0+n)\) and \({\bar y}_n = \frac{s_n}{n}\).
In the simplest forms, the observation \(y_i\) (coronary heart disease status) is related to the covariate \(x_i\) (age) via either a logit or probit link function: \[\begin{eqnarray*} \theta_i &=& \frac{1}{1+e^{-(\beta_0+\beta_1 x_i)}}\\ \theta_i &=& \Phi(\beta_0+\beta_1 x_i), \end{eqnarray*}\] respectively.
Given the data \({\cal D}_n=\{(x_1,y_1),\ldots,(x_n,y_n)_\}\), the likelihood function of \((\beta_0,\beta_1)\) is given by \[ L(\beta_0,\beta_1|{\cal D}_n) = \prod_{i=1}^n \theta_i^{y_i} (1-\theta_i)^{1-y_i}. \]
## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value
## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value
## Warning in nlm(minusloglike.logit, c(-5, 0.1)): NA/Inf replaced by maximum
## positive value
## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value
## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value
## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value
## Warning in nlm(minusloglike.probit, c(-3, 0.07)): NA/Inf replaced by
## maximum positive value
## [1] "betas MLE (logit) : (-5.3095,0.1109)"
## [1] "betas MLE (probit) : (-3.1457,0.0658)"
Below we assume a simple bivariate uniform distribution for \((\beta_0,\beta_1)\) in the rectangle \({\cal A}=\{(-10,-1.5) \times (0.01,0.2)\}\) and use sampling importance resampling (SIR) to convert prior draws into resampled posterior draws.
## [1] "Prior predictive M1 = 2.20379574939852e-33"
## [1] "Prior predictive M2 logit = 1.08336812307888e-25"
## [1] "Prior predictive M2 probit = 3.33156263389443e-26"
## [1] "Bayes factor M1 vs M2 logit = 0"
## [1] "Bayes factor M2 probit vs M2 logit = 0.308"
## [1] "Posterior model probability for M1 = 0"
## [1] "Posterior model probability for M2 logit = 0.765"
## [1] "Posterior model probability for M1 probit = 0.235"