Bernoulli model
Bonus: Monte Carlo approximation
- Approximating \(Pr(S_n)\) by Monte Carlo with Importance Sampling (MCIS)
- Posterior draws via Sampling Importance Resampling (SIR)

Bernoulli model

For \(\theta\) in \((0,1)\) and data=\(\{n,s_n\}\), the likelihood function is given by \[ L(\theta|\mbox{data}) = \frac{n!}{s_n!(n-s_n)!}\theta^{s_n}(1-\theta)^{n-s_n} \]

n= 10
sn=7

a) \(\theta|A \sim Beta(1,1)\)

\[\begin{eqnarray*} p(\theta|A) &\propto& \theta^{1-1}(1-\theta)^{1-1}\\ L(\theta|\mbox{data}) &\propto& \theta^{s_n}(1-\theta)^{n-s_n}\\ \theta|\mbox{data},A &\sim& Beta(s_n+1,n-s_n+1), \end{eqnarray*}\] so \(E(\theta|A)=1/2\) and \[ \theta|\mbox{data},A \sim Beta(8,4) \]

b) \(\theta|B \sim Beta(a,b)\)

\[\begin{eqnarray*} p(\theta|B) &\propto& \theta^{a-1}(1-\theta)^{b-1}\\ L(\theta|\mbox{data}) &\propto& \theta^{s_n}(1-\theta)^{n-s_n}\\ \theta|\mbox{data},B &\sim& Beta(s_n+a,n-s_n+b), \end{eqnarray*}\] so, for \(a=4\) and \(b=2\), then \(E(\theta|B)=2/3\)), and \[ (\theta|\mbox{data},B) \sim Beta(11,5). \]

c) Prior predictive

Recall that \[ p(s_n|C) = \int_\Theta p(s_n|\theta)p(\theta|C)d\theta = \frac{p(s_n|\theta)p(\theta|C)}{p(\theta|\mbox{data},C)}, \] so, the predictive densities are the same for all values of \(\theta\). Try yourself with \(\theta=0.01\), \(\theta=0.5\) and \(\theta=0.75\).

theta=0.5
Psn.A = dbinom(sn,n,theta)*dbeta(theta,1,1)/dbeta(theta,8,4)
Psn.B = dbinom(sn,n,theta)*dbeta(theta,4,2)/dbeta(theta,11,5)
c(Psn.A,Psn.B)

## [1] 0.09090909 0.15984016

d) \(\mbox{logit}(\theta) \sim N(\mu,\sigma^2)\)

It can be easily show that \[ p(\theta|C) = (2\pi\sigma^2)^{-1/2}\exp\left\{-\frac{1}{2\sigma^2}\left[\log\left(\frac{\theta}{1-\theta}\right)-\mu\right]^2\right\} \frac{1}{\theta(1-\theta)}. \] We will approximate the integral \(Pr(s_n|C)\) on a fine grid

priorC = function(theta){
dnorm(log(theta/(1-theta)),mu,sig)/(theta*(1-theta))
}
mu     = 0 
sig    = sqrt(3)
h      = 0.001
thetas = seq(h,1-h,by=h)
Psn.C  = sum(dbinom(sn,n,thetas)*priorC(thetas)*h)
postC  = dbinom(sn,n,thetas)*priorC(thetas)/Psn.C

c(Psn.A,Psn.B,Psn.C)

## [1] 0.09090909 0.15984016 0.08923936

e) Posterior means and variances

When \(\theta \sim Beta(a,b)\), it follows that \[ E(\theta)= \frac{a}{a+b} \ \ \ \mbox{and} \ \ \ V(\theta) = \frac{ab}{(a+b)^2(a+b+1)}. \] We learned that \[ \theta|\mbox{data},A \sim Beta(8,4) \ \ \ \mbox{and} \ \ \ (\theta|\mbox{data},B) \sim Beta(11,5). \]

a=8;b=4
E.A = a/(a+b)
V.A = a*b/((a+b)^2*(a+b+1))
a=11;b=5
E.B = a/(a+b)
V.B = a*b/((a+b)^2*(a+b+1))
c(E.A,E.B)

## [1] 0.6666667 0.6875000

c(sqrt(V.A),sqrt(V.B))

## [1] 0.1307441 0.1124183

For prior C, we need to approximate two more integrals (\(Pr(s_n|C)\) was already approximated!): \[ \mu_C = E(\theta|C) = \frac{\int_0^1 \theta p(s_n|\theta)p(\theta|C)}{Pr(s_n|C)} \ \ \ \mbox{and} \ \ \ V(\theta|C) = \frac{\int_0^1 (\theta-\mu_C)^2 p(s_n|\theta)p(\theta|C)}{Pr(s_n|C)} \]

E.C = sum(thetas*dbinom(sn,n,thetas)*priorC(thetas)*h)/Psn.C
V.C = sum((thetas-E.C)^2*dbinom(sn,n,thetas)*priorC(thetas)*h)/Psn.C
c(E.A,E.B,E.C)

## [1] 0.6666667 0.6875000 0.6734938

c(sqrt(V.A),sqrt(V.B),sqrt(V.C))

## [1] 0.1307441 0.1124183 0.1318999

f) Priors and posteriors

Now, let us plot priors and posterior together

plot(thetas,dbeta(thetas,8,4),ylim=c(0,4),lwd=3,type="l",
xlab=expression(theta),ylab="Density")
lines(thetas,dbeta(thetas,11,5),col=2,lwd=3)
lines(thetas,postC,col=3,lwd=3)
legend("topleft",legend=c("Post A","Post B","Post C"),col=1:3,lwd=3,bty="n")
lines(thetas,dbeta(thetas,1,1),lty=2,lwd=3)
lines(thetas,dbeta(thetas,4,2),lty=2,col=2,lwd=3)
lines(thetas,priorC(thetas),lty=2,col=3,lwd=3)
legend("topright",legend=c("Prior A","Prior B","Prior C"),col=1:3,lwd=3,bty="n",lty=2)

Bonus: Monte Carlo approximation

Approximating \(Pr(S_n)\) by Monte Carlo with Importance Sampling (MCIS)

M       = 10000
theta.d = runif(M)
Psn.A.mc = mean(dbinom(sn,n,theta.d)*dbeta(theta.d,1,1))
Psn.B.mc = mean(dbinom(sn,n,theta.d)*dbeta(theta.d,4,2))
Psn.C.mc = mean(dbinom(sn,n,theta.d)*priorC(theta.d))
tab = cbind(c(Psn.A,Psn.B,Psn.C),c(Psn.A.mc,Psn.B.mc,Psn.C.mc))
rownames(tab) = c("Pred A","Pred B","Pred C")
colnames(tab) = c("Exact","MC")
tab

##             Exact         MC
## Pred A 0.09090909 0.09106378
## Pred B 0.15984016 0.16043537
## Pred C 0.08923936 0.08939873

Posterior draws via Sampling Importance Resampling (SIR)

theta.A = sample(theta.d,size=M,replace=TRUE,prob=dbinom(sn,n,theta.d)*dbeta(theta.d,1,1))
theta.B = sample(theta.d,size=M,replace=TRUE,prob=dbinom(sn,n,theta.d)*dbeta(theta.d,4,2))
theta.C = sample(theta.d,size=M,replace=TRUE,prob=dbinom(sn,n,theta.d)*priorC(theta.d))

Posterior means

tab = cbind(c(E.A,E.B,E.C),c(mean(theta.A),mean(theta.B),mean(theta.C)))
rownames(tab) = c("A","B","C")
colnames(tab) = c("Exact","MC")
tab

##       Exact        MC
## A 0.6666667 0.6654836
## B 0.6875000 0.6889652
## C 0.6734938 0.6720572

Posterior standard deviations

tab = cbind(c(sqrt(V.A),sqrt(V.B),sqrt(V.C)),sqrt(c(var(theta.A),var(theta.B),var(theta.C))))
rownames(tab) = c("A","B","C")
colnames(tab) = c("Exact","MC")
tab

##       Exact        MC
## A 0.1307441 0.1303956
## B 0.1124183 0.1112534
## C 0.1318999 0.1329121

Posterior densities

plot(density(theta.A),lwd=3,ylim=c(0,4),xlab=expression(theta),ylab="Density",lty=2,main="")
lines(density(theta.B),col=2,lwd=3,lty=2)
lines(density(theta.C),col=3,lwd=3,lty=2)
lines(thetas,dbeta(thetas,8,4),lwd=3)
lines(thetas,dbeta(thetas,11,5),col=2,lwd=3)
lines(thetas,postC,col=3,lwd=3)
legend("topleft",legend=c("Post A (Exact)","Post B (Exact)","Post C (Exact)",
                          "Post A (MC)","Post B (MC)","Post C (MC)"),
       col=c(1:3,1:3),lwd=3,bty="n",lty=c(rep(1,3),rep(2,3)))

HW1 - Solution

Advanced Bayesian Statistical Learning

Hedibert Freitas Lopes

2/18/2022