Simple set up: \({\cal X}=\{0,1\}\) and \(\Theta=\{0,1\}\)

In this case the measurement is binary, i.e. it can only take two possible values. Usually, \(X=0\) is a negative result and \(X=1\) is a positive result. Meanwhile, the state of nature is either \(\theta=0\) or \(\theta=1\). In the flu example, \(X\) was the result (positive/negative) of flu test, while \(\theta=1\) meant actual presence of flu.

A not so great test: Likelihood and Prior

Continuing in the flu example, the test \(X\) is such that

Pr.X1.theta0 = 0.40
Pr.X0.theta1 = 0.05
Pr.theta1 = 0.7

\[\begin{eqnarray*} Pr(X=1|\theta=0) &=& 0.4 \qquad \mbox{(False positive)}\\ Pr(X=0|\theta=1) &=& 0.05 \qquad \mbox{(False negative)}\\ Pr(\theta=1) &=& 0.7 \end{eqnarray*}\]

Marginal prior - \(Pr(X=1)\)

A simple probability result \[ P(A) = P(A,B_1)+P(A,B_2)+\cdots+P(A,B_k) \] where \(B_1 \cup B_2 \cup \cdots B_k\) is the complete space of possibilities, leads to the prior marginal probability that the test is positive: \[ Pr(X=1) = Pr(X=1|\theta=0)Pr(\theta=0)+Pr(X=1|\theta=1)Pr(\theta=1). \]

Pr.X1 = Pr.X1.theta0*(1-Pr.theta1) + (1-Pr.X0.theta1)*Pr.theta1
Pr.X1
## [1] 0.785

Now the really interesting questions is the following. Of those whose 78.5%, whose test turned out positive, which fraction actually had the flu? The answer is \[ Pr(\theta=1|X=1), \] or the probability of the actual presence of flu, \(\theta=1\), after observing a positive test, \(X=1\).

Posterior - \(Pr(\theta=1|X=1)\) & \(Pr(\theta=1|X=0)\)

\[ Pr(\theta=1|X=1) = \frac{Pr(X=1|\theta=1)Pr(\theta=1)}{Pr(X=1)} \] \[ Pr(\theta=1|X=0) = \frac{Pr(X=0|\theta=1)Pr(\theta=1)}{Pr(X=0)} \]

Pr.theta1.X1 = (1-Pr.X0.theta1)*Pr.theta1/Pr.X1
Pr.theta1.X0 = Pr.X0.theta1*Pr.theta1/(1-Pr.X1)

c(Pr.theta1.X1,Pr.theta1.X0)
## [1] 0.8471338 0.1627907

To sum up, the initial probability of flue was 0.7, which changed to 0.8471338 if the test was positive and to 0.1627907 if the test was negative.

Y is a much more reliable test!

Assuming that the not so reliable test turned out positive, \(X=1\). Now, the doctor will perform a more reliable test, say \(Y\).

New Prior (old posterior) - \(Pr(\theta=1|X=1)\)

Pr.theta1.X1
## [1] 0.8471338

New likelihood - \(Pr(Y=0|\theta=1)\) and \(Pr(Y=1|\theta=0)\)

Pr.Y0.theta1 = 0.01
Pr.Y1.theta0 = 0.04

New marginal prior - \(Pr(Y=1|X=1)\)

Pr.Y1.X1 = Pr.Y1.theta0*(1-Pr.theta1.X1) + (1-Pr.Y0.theta1)*Pr.theta1.X1
Pr.Y1.X1
## [1] 0.8447771

Posterior - \(Pr(\theta=1|Y=1,X=1)\)

Pr.theta1.Y1.X1 = (1-Pr.Y0.theta1)*Pr.theta1.X1/Pr.Y1.X1
Pr.theta1.Y1.X1
## [1] 0.9927618

A much more compelling result, which begs the question of whether or not \(X\) was needed in the first place. Well, let us check!

Testing \(Y\) but not \(X\)

Pr.Y1 = Pr.Y1.theta0*(1-Pr.theta1) + (1-Pr.Y0.theta1)*Pr.theta1
Pr.Y1
## [1] 0.705

Posterior - \(Pr(\theta=1|Y=1)\)

Pr.theta1.Y1 = (1-Pr.Y0.theta1)*Pr.theta1/Pr.Y1
Pr.theta1.Y1
## [1] 0.9829787

Comparing \(Pr(\theta=1|{\cal H})\), for various \({\cal H}\)

A generic formulae combining \(X\) and \(Y\)

The way the flu example was set up, we can see that both flu tests, \(X\) and \(Y\), are independent random variables following Bernoulli distributions, i.e. \(X|\theta \sim Ber(\pi_\theta)\) and \(Y|\theta \sim Ber(\omega_\theta)\), where \(\pi_\theta\) and \(\omega_\theta\) are probabilities of success (presence of flu) which depend on whether or not flu is actually present, i.e. whether or not \(\theta=1\) or \(\theta=0\). More explicitly, \[\begin{eqnarray*} X|\theta=1 &\sim& Ber(0.95)\\ X|\theta=0 &\sim& Ber(0.4)\\ Y|\theta=1 &\sim& Ber(0.99)\\ Y|\theta=0 &\sim& Ber(0.04), \end{eqnarray*}\] such that \[\begin{eqnarray*} Pr(x,y|\theta=1) &=& (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}\\ Pr(x,y|\theta=0) &=& (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}, \end{eqnarray*}\] and \[\begin{eqnarray*} Pr(\theta=i|x,y) &=& \frac{Pr(x,y|\theta=i)Pr(\theta=i)}{Pr(x,y)}=\frac{Pr(x,y|\theta=i)Pr(\theta=i)}{Pr(x,y|\theta=0)Pr(\theta=0)+Pr(x,y|\theta=1)Pr(\theta=1)}\\ &=&\frac{Pr(x,y|\theta=i)Pr(\theta=i)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}. \end{eqnarray*}\] Or, more explicitly, \[ Pr(\theta=0|x,y) = \frac{(0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)} \] and \[ Pr(\theta=1|x,y) = \frac{(0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}. \] Since the denominators are the same, one could write \[\begin{eqnarray*} Pr(\theta=0|x,y) &\propto& (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3)\\ Pr(\theta=1|x,y) &\propto& (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7). \end{eqnarray*}\]

In order to assess which value of \(\theta\) has higher posterior probability, one would not need to compute the prior predictive \(Pr(x,y)\) (the denominator).

post = NULL
for (y in 1:0)
for (x in 1:0){
  Post.theta0 = Pr.X1.theta0^x*(1-Pr.X1.theta0)^(1-x)*Pr.Y1.theta0^y*(1-Pr.Y1.theta0)^(1-y)*(1-Pr.theta1)
  Post.theta1 = (1-Pr.X0.theta1)^x*Pr.X0.theta1^(1-x)*(1-Pr.Y0.theta1)^y*Pr.Y0.theta1^(1-y)*Pr.theta1
  prob = Post.theta1/sum(Post.theta0+Post.theta1)
  prob1 = prob/Pr.theta1
  post = rbind(post,c(x,y,prob,prob1))
}
colnames(post) = c("x","y","Pr(theta=1|x,y)","Ratio to prior")
round(post,6)
##      x y Pr(theta=1|x,y) Ratio to prior
## [1,] 1 1        0.992762       1.418231
## [2,] 0 1        0.827957       1.182796
## [3,] 1 0        0.054575       0.077965
## [4,] 0 0        0.002021       0.002888

Recall that the prior is \(Pr(\theta=1)=0.7\).