In this case the measurement is binary, i.e. it can only take two possible values. Usually, \(X=0\) is a negative result and \(X=1\) is a positive result. Meanwhile, the state of nature is either \(\theta=0\) or \(\theta=1\). In the flu example, \(X\) was the result (positive/negative) of flu test, while \(\theta=1\) meant actual presence of flu.
Continuing in the flu example, the test \(X\) is such that
Pr.X1.theta0 = 0.40
Pr.X0.theta1 = 0.05
Pr.theta1 = 0.7
\[\begin{eqnarray*} Pr(X=1|\theta=0) &=& 0.4 \qquad \mbox{(False positive)}\\ Pr(X=0|\theta=1) &=& 0.05 \qquad \mbox{(False negative)}\\ Pr(\theta=1) &=& 0.7 \end{eqnarray*}\]
A simple probability result \[ P(A) = P(A,B_1)+P(A,B_2)+\cdots+P(A,B_k) \] where \(B_1 \cup B_2 \cup \cdots B_k\) is the complete space of possibilities, leads to the prior marginal probability that the test is positive: \[ Pr(X=1) = Pr(X=1|\theta=0)Pr(\theta=0)+Pr(X=1|\theta=1)Pr(\theta=1). \]
Pr.X1 = Pr.X1.theta0*(1-Pr.theta1) + (1-Pr.X0.theta1)*Pr.theta1
Pr.X1
## [1] 0.785
Now the really interesting questions is the following. Of those whose 78.5%, whose test turned out positive, which fraction actually had the flu? The answer is \[ Pr(\theta=1|X=1), \] or the probability of the actual presence of flu, \(\theta=1\), after observing a positive test, \(X=1\).
\[ Pr(\theta=1|X=1) = \frac{Pr(X=1|\theta=1)Pr(\theta=1)}{Pr(X=1)} \] \[ Pr(\theta=1|X=0) = \frac{Pr(X=0|\theta=1)Pr(\theta=1)}{Pr(X=0)} \]
Pr.theta1.X1 = (1-Pr.X0.theta1)*Pr.theta1/Pr.X1
Pr.theta1.X0 = Pr.X0.theta1*Pr.theta1/(1-Pr.X1)
c(Pr.theta1.X1,Pr.theta1.X0)
## [1] 0.8471338 0.1627907
To sum up, the initial probability of flue was 0.7, which changed to 0.8471338 if the test was positive and to 0.1627907 if the test was negative.
Assuming that the not so reliable test turned out positive, \(X=1\). Now, the doctor will perform a more reliable test, say \(Y\).
Pr.theta1.X1
## [1] 0.8471338
Pr.Y0.theta1 = 0.01
Pr.Y1.theta0 = 0.04
Pr.Y1.X1 = Pr.Y1.theta0*(1-Pr.theta1.X1) + (1-Pr.Y0.theta1)*Pr.theta1.X1
Pr.Y1.X1
## [1] 0.8447771
Pr.theta1.Y1.X1 = (1-Pr.Y0.theta1)*Pr.theta1.X1/Pr.Y1.X1
Pr.theta1.Y1.X1
## [1] 0.9927618
A much more compelling result, which begs the question of whether or not \(X\) was needed in the first place. Well, let us check!
Pr.Y1 = Pr.Y1.theta0*(1-Pr.theta1) + (1-Pr.Y0.theta1)*Pr.theta1
Pr.Y1
## [1] 0.705
Pr.theta1.Y1 = (1-Pr.Y0.theta1)*Pr.theta1/Pr.Y1
Pr.theta1.Y1
## [1] 0.9829787
Before performing any test: \(P(\theta=1)=0.7\)
After test result is \(X=1\): \(P(\theta=1|X=1)=0.85\) (An increase of 21%)
After test result is \(Y=1\): \(P(\theta=1|Y=1)=0.98\) (An increase of 40.4%)
After tests results are \(X=Y=1\): \(P(\theta=1|X=1,Y=1)=0.99\) (An increase of 41.8%)
The way the flu example was set up, we can see that both flu tests, \(X\) and \(Y\), are independent random variables following Bernoulli distributions, i.e. \(X|\theta \sim Ber(\pi_\theta)\) and \(Y|\theta \sim Ber(\omega_\theta)\), where \(\pi_\theta\) and \(\omega_\theta\) are probabilities of success (presence of flu) which depend on whether or not flu is actually present, i.e. whether or not \(\theta=1\) or \(\theta=0\). More explicitly, \[\begin{eqnarray*} X|\theta=1 &\sim& Ber(0.95)\\ X|\theta=0 &\sim& Ber(0.4)\\ Y|\theta=1 &\sim& Ber(0.99)\\ Y|\theta=0 &\sim& Ber(0.04), \end{eqnarray*}\] such that \[\begin{eqnarray*} Pr(x,y|\theta=1) &=& (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}\\ Pr(x,y|\theta=0) &=& (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}, \end{eqnarray*}\] and \[\begin{eqnarray*} Pr(\theta=i|x,y) &=& \frac{Pr(x,y|\theta=i)Pr(\theta=i)}{Pr(x,y)}=\frac{Pr(x,y|\theta=i)Pr(\theta=i)}{Pr(x,y|\theta=0)Pr(\theta=0)+Pr(x,y|\theta=1)Pr(\theta=1)}\\ &=&\frac{Pr(x,y|\theta=i)Pr(\theta=i)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}. \end{eqnarray*}\] Or, more explicitly, \[ Pr(\theta=0|x,y) = \frac{(0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)} \] and \[ Pr(\theta=1|x,y) = \frac{(0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}{ (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3) \ + \ (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7)}. \] Since the denominators are the same, one could write \[\begin{eqnarray*} Pr(\theta=0|x,y) &\propto& (0.4)^x(0.6)^{1-x}(0.04)^y(0.96)^{1-y}(0.3)\\ Pr(\theta=1|x,y) &\propto& (0.95)^x(0.05)^{1-x}(0.99)^y(0.01)^{1-y}(0.7). \end{eqnarray*}\]
In order to assess which value of \(\theta\) has higher posterior probability, one would not need to compute the prior predictive \(Pr(x,y)\) (the denominator).
post = NULL
for (y in 1:0)
for (x in 1:0){
Post.theta0 = Pr.X1.theta0^x*(1-Pr.X1.theta0)^(1-x)*Pr.Y1.theta0^y*(1-Pr.Y1.theta0)^(1-y)*(1-Pr.theta1)
Post.theta1 = (1-Pr.X0.theta1)^x*Pr.X0.theta1^(1-x)*(1-Pr.Y0.theta1)^y*Pr.Y0.theta1^(1-y)*Pr.theta1
prob = Post.theta1/sum(Post.theta0+Post.theta1)
prob1 = prob/Pr.theta1
post = rbind(post,c(x,y,prob,prob1))
}
colnames(post) = c("x","y","Pr(theta=1|x,y)","Ratio to prior")
round(post,6)
## x y Pr(theta=1|x,y) Ratio to prior
## [1,] 1 1 0.992762 1.418231
## [2,] 0 1 0.827957 1.182796
## [3,] 1 0 0.054575 0.077965
## [4,] 0 0 0.002021 0.002888
Recall that the prior is \(Pr(\theta=1)=0.7\).