1 Introduction

The Capital Asset Pricing Model (CAPM) is one of the cornerstones of modern financial theory, establishing a linear relationship between an asset’s systematic risk and its expected return. Developed in the 1960s, the model postulates that an investment’s return should be proportional to its exposure to market risk, measured by the Beta (\(\beta\)) coefficient, in addition to a risk-free rate.

The central premise is that investors should be compensated only for risk that cannot be diversified (systematic risk), while company-specific risk can be eliminated through a well-structured portfolio. Despite criticisms regarding its simplifications—such as market efficiency and the focus on a single risk factor—CAPM remains widely used to estimate the cost of equity and evaluate the performance of portfolio managers.

1.1 Behavior of \(\beta\)

In the context of the Capital Asset Pricing Model (CAPM), the Beta (\(\beta\)) coefficient is a measure of an asset’s systematic risk—the risk that cannot be diversified away. It quantifies how sensitive a specific stock’s returns are to the movements of the broader market.+1Mathematically, it is defined as: \[ \beta = \frac{Cov(R, R_m)}{Var(R_m)} \] Where \(R\) and \(R_m\) are the asset and market returns, respectively.

  • \(\beta > 1\) (High Sensitivity) - The asset is more volatile than the market. If the market rises by 10%, a stock with a \(\beta\) of 1.5 is expected to rise by 15%. Conversely, it will likely fall further than the market during a downturn. Stocks in sectors like technology or biotechnology often have high betas.

  • \(\beta = 1\) (Market Neutrality) - The asset moves in lockstep with the market. Its systematic risk is identical to that of the market proxy (like the S&P 500).

  • \(0<\beta<1\) (Low Sensitivity) - The asset is less volatile than the market. It is considered a “defensive” stock. For example, a utility company might have a \(\beta\) of 0.5, meaning it only captures about half of the market’s swings.

  • \(\beta=0\) (No Correlation) - The asset’s returns have no linear relationship with the market’s returns. A risk-free asset, like a Treasury bond (in theory), has a \(\beta\) of 0. Gold is also sometimes cited as having a beta near zero, as its price is often driven by factors unrelated to equity markets.

  • \(\beta<0\) (Inverse Correlation) - The asset moves in the opposite direction of the market. While rare for individual stocks, some “inverse ETFs” or “put options” are designed to have negative betas. Historically, during extreme market crashes, very few traditional assets maintain a negative beta consistently, though some “safe havens” may exhibit this behavior temporarily.

1.2 Petrobras vs S&P500

In this document, we estimate the CAPM parameters for Petrobras (PBR), traded on the NYSE, using the S&P 500 (SPY) as the market proxy. We adopt a Bayesian framework, which allows us to incorporate prior beliefs and obtain a full posterior distribution for the risk parameters.

The CAPM equation is defined as:

\[ R_t = \alpha + \beta R_{mt}+ \epsilon_t \]

Where:

  • \(t\) is the time frequency, usually daily.

  • \(R_t\): Return of the risky asset (PBR).

  • \(R_{mt}\): Return of the market index (S&P 500).

  • \(\alpha\): The “Alpha”, representing abnormal returns.

  • \(\beta\): The “Beta”, measuring the asset’s sensitivity to market movements.

  • \(\epsilon_t \sim N(0, \sigma^2)\): The idiosyncratic error term with variance \(\sigma^2\).

2 Uploading and processing the data

We uploaded data from February 10th 2016 to February 9th 2026, ie about 10 years of daily data or \(n=2514\) observations.

if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(quantmod, brms, tidyverse, PerformanceAnalytics)

symbols <- c("PBR", "SPY")
getSymbols(symbols, src = "yahoo", from = "2016-02-09", to = Sys.Date())
## [1] "PBR" "SPY"
# Computing returns
returns_pbr <- Return.calculate(Cl(PBR), method = "log")
returns_spy <- Return.calculate(Cl(SPY), method = "log")

# Combining the time series and removing missing values
data_capm <- merge(returns_pbr, returns_spy) %>% 
  as.data.frame() %>% 
  drop_na()
colnames(data_capm) <- c("PBR", "Market")

y = data_capm[,1]

x = data_capm[,2]

par(mfrow=c(1,1))
ts.plot(y,ylab="Returns")
lines(x,col=2)
legend("topright",legend=c("PBR","SPY"),col=1:2,lty=1,bty="n",lwd=2)

3 Ordinary least squares

summary(lm(y~x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.272099 -0.013966  0.000262  0.014354  0.157838 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0000107  0.0005673   0.019    0.985    
## x           1.2151846  0.0499735  24.317   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02841 on 2512 degrees of freedom
## Multiple R-squared:  0.1905, Adjusted R-squared:  0.1902 
## F-statistic: 591.3 on 1 and 2512 DF,  p-value: < 2.2e-16
par(mfrow=c(1,1))
plot(x,y,xlab="SPY",ylab="PBR")
title("Jan 3rd 2020 - February 6th 2026")
abline(lm(y~x),col=2,lwd=2)

4 Maximum likelihood estimation

When \(x=(x_1,\ldots,x_n)'\) and \(X\) is an \((n \times 2)\) matrix where the first column contains only ones (for the intercept) and the second column is \(x\), the Gaussian linear model can be represented as \[ y = X \theta + \epsilon \qquad \qquad \epsilon \sim N(0,\sigma^2 I_n) \] with parameters \(\theta=(\alpha,\beta)\) and \(\sigma^2\). The ML estimator of \(\theta\) and \(\sigma^2\) are \[ {\widehat \theta} = (X'X)^{-1}X'y \qquad \mbox{and} \qquad {\widehat \sigma}^2 = \frac{(y-X{\widehat \theta})'(y-X{\widehat \theta})}{n}, \] respectively.

X         = cbind(1,x)
n         = nrow(X)
p         = ncol(X)
theta.mle = solve(t(X)%*%X)%*%t(X)%*%y
y.hat     = X%*%theta.mle
e.hat     = y-y.hat
sig.mle   = sqrt(mean(e.hat^2))

table = round(rbind(cbind(theta.mle,lm(y~X-1)$coef),
              c(sig.mle,(n-p)/n*summary(lm(y~X-1))$sigma)),4)
colnames(table)      = c("Our function","lm function")
rownames(table)[1]   = "constant"
rownames(table)[p+1] = "sigma"
table
##          Our function lm function
## constant       0.0000      0.0000
## x              1.2152      1.2152
## sigma          0.0284      0.0284

5 Closed-form Bayesian inference (conjugacy)

Let us combine the above Gaussian linear model with a conjugate prior for \(\theta,\sigma^2\): \[\begin{eqnarray*} \theta|\sigma^2 &\sim& N(b_0,\sigma^2 B_0)\\ \sigma^2 &\sim& IG(\nu_0/2,\nu_0\sigma_0^2/2), \end{eqnarray*}\] for hyperparameters \(b_0,B_0,\nu_0\) and \(\sigma_0^2\). This prior is conjugate since the posterior follows the same family of distributions as the prior with hyperparameters adjusted by the observed data. More precisely, \[\begin{eqnarray*} \theta|\sigma^2,\mbox{data} &\sim& N(b_1,\sigma^2 B_1)\\ \sigma^2|\mbox{data} &\sim& IG(\nu_1/2,\nu_1\sigma_1^2/2)\\ \theta|\mbox{data} &\sim& t_{\nu_1}(b_1,\sigma_1^2 B_1), \end{eqnarray*}\] where \[\begin{eqnarray*} B_1 &=& (B_0^{-1}+X'X)^{-1}\\ B_1^{-1} b_1 &=& B_0^{-1} b_0 + X'y\\ \nu_1 &=& \nu_0 + n\\ \nu_1\sigma_1^2 &=& \nu_0\sigma_0^2 + (y-X b_1)'y + (b_0-b_1)'B_0^{-1}b_0. \end{eqnarray*}\]

Notice that the above derivations, both MLE and Bayesian with conjugate prior, for the Gaussian linear regression is still valid for \(p>2\) and is the base of the standard Gaussian linear regression framework taught in virtually all introductory econometrics classes.

# Prior hyperparameters
b0       = c(0,1)
B0       = diag(c(1000,250),2)
nu0      = 0.002
sig20    = 1.0

# Posterior hyperparameters
nu0sig20 = nu0*sig20
iB0      = solve(B0)
iB0b0    = iB0%*%b0
B1       = solve(iB0+t(X)%*%X)
tcB1     = t(chol(B1))
b1       = B1%*%(iB0b0+t(X)%*%y)
nu1      = nu0 + n
sig21    = ((nu0sig20+sum((y-X%*%b1)*y)+t(b0-b1)%*%iB0b0)/nu1)[1]
se       = sqrt(nu1/(nu1-2)*sig21*diag(B1))

tab = round(cbind(b1,se,b1/se,2*(1-pt(abs(b1/se),df=nu1))),5)
colnames(tab)=c("Mean","StDev","t","tail")
rownames(tab)=c("Intercept","Market")
tab
##              Mean   StDev        t    tail
## Intercept 0.00001 0.00057  0.02129 0.98302
## Market    1.21255 0.04969 24.40050 0.00000

6 MCMC-based Bayesian inference (conditionally conjugacy)

When the conjugate prior, presented above, is replaced by the following conditionally conjugate one, we loose closed form derivation for posterior inference: \[\begin{eqnarray*} \theta|\sigma^2 &\sim& N(b_0,V_0)\\ \sigma^2 &\sim& IG(c_0,d_0). \end{eqnarray*}\] In this case, a standard solution since the early 1990s is to implement an Markov chain Monte Carlo (MCMC) algorithm. For the Gaussian linear model with conditionally conjugate prior, it is well established that the Gibbs sampler is the simplest and cleanest MCMC scheme.

The Gibbs sampler, in our particular set up, cycles through the two full conditionals \[ p(\theta|\sigma^2,\mbox{data}) \ \ \ \mbox{and} \ \ \ p(\sigma^2|\theta,\mbox{data}), \] both of which are easily derived: \[ N(b_1,V_1) \ \ \ \mbox{and} \ \ \ IG(c_1,d_1) \] where \[\begin{eqnarray*} V_1^{-1} &=& X'X/\sigma^2 + V_0^{-1}\\ V_1^{-1}b_1 &=& X'y/\sigma^2 + V_0^{-1} b_0\\ c_1 &=& c_0 + n/2\\ d_1 &=& d_0 + (y-X'\theta)(y-X'\theta)/2 \end{eqnarray*}\]

# Prior knowledge 
b0 = rep(0,p)
V0 = diag(c(1,0.25))
c0 = 0.001
d0 = 0.001
iV0 = solve(V0)
iV0b0 = iV0%*%b0

# Gibbs sampler
sigma2  = 1 
M0      = 1000
M       = 10000
thin    = 1
niter   = M0+M*thin
thetas  = matrix(0,niter,p)
sigs    = rep(0,niter)
for (iter in 1:niter){
  V1 = solve(t(X)%*%X/sigma2+iV0)
  b1 = V1%*%(t(X)%*%y/sigma2+iV0b0)
  theta = b1 + t(chol(V1))%*%rnorm(p)
  c1 = c0 + n/2
  d1 = d0 + sum((y-X%*%theta)^2)/2
  sigma2 = 1/rgamma(1,c1,d1)
  thetas[iter,] = theta
  sigs[iter] = sqrt(sigma2)
}
ind    = seq(M0+1,niter,by=thin)
thetas = thetas[ind,]
sigs   = sigs[ind]

mean.theta = apply(thetas,2,mean)
table = round(cbind(theta.mle,mean.theta),5)
colnames(table) = c("ML","Bayes")
rownames(table) = c("Intercept","Market")
table
##                ML   Bayes
## Intercept 0.00001 0.00002
## Market    1.21518 1.20305
par(mfrow=c(3,3))
ts.plot(thetas[,1],xlab="Iterations",ylab="",main=expression(alpha))
ts.plot(thetas[,2],xlab="Iterations",ylab="",main=expression(beta))
ts.plot(sigs,xlab="Iterations",ylab="",main=expression(sigma))
acf(thetas[,1],main="")
acf(thetas[,2],main="")
acf(sigs,main="")
hist(thetas[,1],prob=TRUE,xlab="",main="")
hist(thetas[,2],prob=TRUE,xlab="",main="")
hist(sigs,prob=TRUE,xlab="",main="")

7 Comparison

We compare the ML estimate with the posterior densities, based on both conjugate and conditionally conjugate priors, for the parameters \(\alpha\), \(\beta\) and \(\sigma^2\).

par(mfrow=c(1,3))
plot(density(thetas[,1]),xlab="Intercept",main="")
bb = seq(min(thetas[,1]),max(thetas[,1]),length=1000)
den = dt((bb-b1[1,1])/sqrt(sig21*B1[1,1]),df=nu1)/sqrt(sig21*B1[1,1])
lines(bb,den,col=2,lwd=2)
abline(v=theta.mle[1,1],col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)

plot(density(thetas[,2]),xlab="Slope",main="")
bb = seq(min(thetas[,2]),max(thetas[,2]),length=1000)
den = dt((bb-b1[2,1])/sqrt(sig21*B1[2,2]),df=nu1)/sqrt(sig21*B1[2,2])
lines(bb,den,col=2,lwd=2)
abline(v=theta.mle[2,1],col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)


sigs2 = sigs^2
s2    = seq(min(sigs2),max(sigs2),length=1000)
den   = dgamma(1/s2,nu1/2,nu1*sig21/2)/(s2^2)
den.d = density(sigs2)
plot(den.d$x,den.d$y,xlab="Error variance",main="",ylim=range(den.d$y,den),type="l",ylab="Density")
lines(s2,den,col=2,lwd=2)
abline(v=sig.mle^2,col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)

8 Things you had to know to understand this class

  1. Covariance

  2. Variance

  3. Fitting a line via ordinary least squares

  4. Gaussian linear model

  5. Basic matrix algebra

  6. Likelihood function

  7. maximum likelihood estimator

  8. Prior and posterior distributions

  9. Inverse-gamma distribution

  10. Conjugacy

  11. Conditional conjugacy

  12. Monte Carlo posterior simulation

---
title: "CAPM: Bayesian linear modeling"
subtitle: "Petrobras vs S&P 500"
author: "Hedibert Freitas Lopes"
date: "`r Sys.Date()`"
output:
  html_document:
    theme: flatly
    highlight: tango
    toc: true
    toc_depth: 3
    toc_collapsed: true
    code_download: true
    number_sections: true
---

# Introduction

The **Capital Asset Pricing Model (CAPM)** is one of the cornerstones of modern financial theory, establishing a linear relationship between an asset's systematic risk and its expected return. Developed in the 1960s, the model postulates that an investment's return should be proportional to its exposure to market risk, measured by the Beta ($\beta$) coefficient, in addition to a risk-free rate.  

The central premise is that investors should be compensated only for risk that cannot be diversified (systematic risk), while company-specific risk can be eliminated through a well-structured portfolio. Despite criticisms regarding its simplifications—such as market efficiency and the focus on a single risk factor—CAPM remains widely used to estimate the cost of equity and evaluate the performance of portfolio managers.

## Behavior of $\beta$

In the context of the Capital Asset Pricing Model (CAPM), the Beta ($\beta$) coefficient is a measure of an asset's systematic risk—the risk that cannot be diversified away. It quantifies how sensitive a specific stock's returns are to the movements of the broader market.+1Mathematically, it is defined as:
$$
\beta = \frac{Cov(R, R_m)}{Var(R_m)}
$$
Where $R$ and $R_m$ are the asset and market returns, respectively.

* $\beta > 1$ *(High Sensitivity)* - The asset is more volatile than the market. If the market rises by 10%, a stock with a $\beta$ of 1.5 is expected to rise by 15%. Conversely, it will likely fall further than the market during a downturn. Stocks in sectors like technology or biotechnology often have high betas.

* $\beta = 1$ *(Market Neutrality)* - The asset moves in lockstep with the market. Its systematic risk is identical to that of the market proxy (like the S&P 500).

* $0<\beta<1$ *(Low Sensitivity)* - The asset is less volatile than the market. It is considered a "defensive" stock. For example, a utility company might have a $\beta$ of 0.5, meaning it only captures about half of the market's swings.

* $\beta=0$ *(No Correlation)* - The asset's returns have no linear relationship with the market's returns. A risk-free asset, like a Treasury bond (in theory), has a $\beta$ of 0. Gold is also sometimes cited as having a beta near zero, as its price is often driven by factors unrelated to equity markets.

* $\beta<0$ *(Inverse Correlation)* - The asset moves in the opposite direction of the market. While rare for individual stocks, some "inverse ETFs" or "put options" are designed to have negative betas. Historically, during extreme market crashes, very few traditional assets maintain a negative beta consistently, though some "safe havens" may exhibit this behavior temporarily.

## Petrobras vs S&P500

In this document, we estimate the CAPM parameters for **Petrobras (PBR)**, traded on the NYSE, using the **S&P 500 (SPY)** as the market proxy. We adopt a Bayesian framework, which allows us to incorporate prior beliefs and obtain a full posterior distribution for the risk parameters.

The CAPM equation is defined as:

$$
R_t = \alpha + \beta R_{mt}+ \epsilon_t
$$

Where:

* $t$ is the time frequency, usually daily.

* $R_t$: Return of the risky asset (PBR).

* $R_{mt}$: Return of the market index (S&P 500).

* $\alpha$: The "Alpha", representing abnormal returns.

* $\beta$: The "Beta", measuring the asset's sensitivity to market movements.

* $\epsilon_t \sim N(0, \sigma^2)$: The idiosyncratic error term with variance $\sigma^2$.

# Uploading and processing the data

We uploaded data from February 10th 2016 to February 9th 2026, ie about 10 years of daily data or $n=2514$ observations.

```{r fig.align='center', fig.width=10, fig.height=6}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(quantmod, brms, tidyverse, PerformanceAnalytics)

symbols <- c("PBR", "SPY")
getSymbols(symbols, src = "yahoo", from = "2016-02-09", to = Sys.Date())

# Computing returns
returns_pbr <- Return.calculate(Cl(PBR), method = "log")
returns_spy <- Return.calculate(Cl(SPY), method = "log")

# Combining the time series and removing missing values
data_capm <- merge(returns_pbr, returns_spy) %>% 
  as.data.frame() %>% 
  drop_na()
colnames(data_capm) <- c("PBR", "Market")

y = data_capm[,1]

x = data_capm[,2]

par(mfrow=c(1,1))
ts.plot(y,ylab="Returns")
lines(x,col=2)
legend("topright",legend=c("PBR","SPY"),col=1:2,lty=1,bty="n",lwd=2)
```

# Ordinary least squares
```{r fig.align='center', fig.width=10, fig.height=6}
summary(lm(y~x))

par(mfrow=c(1,1))
plot(x,y,xlab="SPY",ylab="PBR")
title("Jan 3rd 2020 - February 6th 2026")
abline(lm(y~x),col=2,lwd=2)
```

# Maximum likelihood estimation
When $x=(x_1,\ldots,x_n)'$ and $X$ is an $(n \times 2)$ matrix where the first column contains only ones (for the intercept) and the second column is $x$, the Gaussian linear model can be represented as
$$
y = X \theta + \epsilon \qquad \qquad \epsilon \sim N(0,\sigma^2 I_n)
$$
with parameters $\theta=(\alpha,\beta)$ and $\sigma^2$.  The ML estimator of $\theta$ and $\sigma^2$ are
$$
{\widehat \theta} = (X'X)^{-1}X'y \qquad \mbox{and} \qquad
{\widehat \sigma}^2 = \frac{(y-X{\widehat \theta})'(y-X{\widehat \theta})}{n},
$$
respectively.

```{r}
X         = cbind(1,x)
n         = nrow(X)
p         = ncol(X)
theta.mle = solve(t(X)%*%X)%*%t(X)%*%y
y.hat     = X%*%theta.mle
e.hat     = y-y.hat
sig.mle   = sqrt(mean(e.hat^2))

table = round(rbind(cbind(theta.mle,lm(y~X-1)$coef),
              c(sig.mle,(n-p)/n*summary(lm(y~X-1))$sigma)),4)
colnames(table)      = c("Our function","lm function")
rownames(table)[1]   = "constant"
rownames(table)[p+1] = "sigma"
table
```


# Closed-form Bayesian inference (conjugacy)
Let us combine the above Gaussian linear model with a conjugate prior
for $\theta,\sigma^2$:
\begin{eqnarray*}
\theta|\sigma^2 &\sim& N(b_0,\sigma^2 B_0)\\
\sigma^2 &\sim& IG(\nu_0/2,\nu_0\sigma_0^2/2),
\end{eqnarray*}
for hyperparameters $b_0,B_0,\nu_0$ and $\sigma_0^2$.  This prior is conjugate since the posterior follows the same family of distributions as the prior with hyperparameters adjusted by the observed data.  More precisely,
\begin{eqnarray*}
\theta|\sigma^2,\mbox{data} &\sim& N(b_1,\sigma^2 B_1)\\
\sigma^2|\mbox{data} &\sim& IG(\nu_1/2,\nu_1\sigma_1^2/2)\\
\theta|\mbox{data} &\sim& t_{\nu_1}(b_1,\sigma_1^2 B_1),
\end{eqnarray*}
where 
\begin{eqnarray*}
B_1 &=& (B_0^{-1}+X'X)^{-1}\\
B_1^{-1} b_1 &=& B_0^{-1} b_0 + X'y\\
\nu_1 &=& \nu_0 + n\\
\nu_1\sigma_1^2 &=& \nu_0\sigma_0^2 + (y-X b_1)'y + (b_0-b_1)'B_0^{-1}b_0.
\end{eqnarray*}


Notice that the above derivations, both MLE and Bayesian with conjugate prior, for the Gaussian linear regression is still valid for $p>2$ and is the base of the standard Gaussian linear regression framework taught in virtually all introductory econometrics classes.


```{r fig.align='center', fig.width=12, fig.height=18}
# Prior hyperparameters
b0       = c(0,1)
B0       = diag(c(1000,250),2)
nu0      = 0.002
sig20    = 1.0

# Posterior hyperparameters
nu0sig20 = nu0*sig20
iB0      = solve(B0)
iB0b0    = iB0%*%b0
B1       = solve(iB0+t(X)%*%X)
tcB1     = t(chol(B1))
b1       = B1%*%(iB0b0+t(X)%*%y)
nu1      = nu0 + n
sig21    = ((nu0sig20+sum((y-X%*%b1)*y)+t(b0-b1)%*%iB0b0)/nu1)[1]
se       = sqrt(nu1/(nu1-2)*sig21*diag(B1))

tab = round(cbind(b1,se,b1/se,2*(1-pt(abs(b1/se),df=nu1))),5)
colnames(tab)=c("Mean","StDev","t","tail")
rownames(tab)=c("Intercept","Market")
tab
```

# MCMC-based Bayesian inference (conditionally conjugacy)
When the conjugate prior, presented above, is replaced by the following conditionally conjugate one, we loose closed form derivation for 
posterior inference:
\begin{eqnarray*}
\theta|\sigma^2 &\sim& N(b_0,V_0)\\
\sigma^2 &\sim& IG(c_0,d_0).
\end{eqnarray*}
In this case, a standard solution since the early 1990s is to implement an Markov chain Monte Carlo (MCMC) algorithm.  For the Gaussian linear model with conditionally conjugate prior, it is well established that the Gibbs sampler is the simplest and cleanest MCMC scheme.

The Gibbs sampler, in our particular set up, cycles through the two full conditionals 
$$
p(\theta|\sigma^2,\mbox{data}) \ \ \ \mbox{and} \ \ \ 
p(\sigma^2|\theta,\mbox{data}),
$$
both of which are easily derived:
$$
N(b_1,V_1) \ \ \ \mbox{and} \ \ \ IG(c_1,d_1)
$$
where
\begin{eqnarray*}
V_1^{-1} &=& X'X/\sigma^2 + V_0^{-1}\\
V_1^{-1}b_1 &=&  X'y/\sigma^2 + V_0^{-1} b_0\\
c_1 &=& c_0  + n/2\\
d_1 &=& d_0 + (y-X'\theta)(y-X'\theta)/2
\end{eqnarray*}


```{r fig.align='center', fig.width=9, fig.height=9}
# Prior knowledge 
b0 = rep(0,p)
V0 = diag(c(1,0.25))
c0 = 0.001
d0 = 0.001
iV0 = solve(V0)
iV0b0 = iV0%*%b0

# Gibbs sampler
sigma2  = 1 
M0      = 1000
M       = 10000
thin    = 1
niter   = M0+M*thin
thetas  = matrix(0,niter,p)
sigs    = rep(0,niter)
for (iter in 1:niter){
  V1 = solve(t(X)%*%X/sigma2+iV0)
  b1 = V1%*%(t(X)%*%y/sigma2+iV0b0)
  theta = b1 + t(chol(V1))%*%rnorm(p)
  c1 = c0 + n/2
  d1 = d0 + sum((y-X%*%theta)^2)/2
  sigma2 = 1/rgamma(1,c1,d1)
  thetas[iter,] = theta
  sigs[iter] = sqrt(sigma2)
}
ind    = seq(M0+1,niter,by=thin)
thetas = thetas[ind,]
sigs   = sigs[ind]

mean.theta = apply(thetas,2,mean)
table = round(cbind(theta.mle,mean.theta),5)
colnames(table) = c("ML","Bayes")
rownames(table) = c("Intercept","Market")
table

par(mfrow=c(3,3))
ts.plot(thetas[,1],xlab="Iterations",ylab="",main=expression(alpha))
ts.plot(thetas[,2],xlab="Iterations",ylab="",main=expression(beta))
ts.plot(sigs,xlab="Iterations",ylab="",main=expression(sigma))
acf(thetas[,1],main="")
acf(thetas[,2],main="")
acf(sigs,main="")
hist(thetas[,1],prob=TRUE,xlab="",main="")
hist(thetas[,2],prob=TRUE,xlab="",main="")
hist(sigs,prob=TRUE,xlab="",main="")
```

# Comparison

We compare the ML estimate with the posterior densities, based on both conjugate and conditionally conjugate priors, for the parameters $\alpha$, $\beta$ and $\sigma^2$.

```{r fig.align='center', fig.width=10, fig.height=4}
par(mfrow=c(1,3))
plot(density(thetas[,1]),xlab="Intercept",main="")
bb = seq(min(thetas[,1]),max(thetas[,1]),length=1000)
den = dt((bb-b1[1,1])/sqrt(sig21*B1[1,1]),df=nu1)/sqrt(sig21*B1[1,1])
lines(bb,den,col=2,lwd=2)
abline(v=theta.mle[1,1],col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)

plot(density(thetas[,2]),xlab="Slope",main="")
bb = seq(min(thetas[,2]),max(thetas[,2]),length=1000)
den = dt((bb-b1[2,1])/sqrt(sig21*B1[2,2]),df=nu1)/sqrt(sig21*B1[2,2])
lines(bb,den,col=2,lwd=2)
abline(v=theta.mle[2,1],col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)


sigs2 = sigs^2
s2    = seq(min(sigs2),max(sigs2),length=1000)
den   = dgamma(1/s2,nu1/2,nu1*sig21/2)/(s2^2)
den.d = density(sigs2)
plot(den.d$x,den.d$y,xlab="Error variance",main="",ylim=range(den.d$y,den),type="l",ylab="Density")
lines(s2,den,col=2,lwd=2)
abline(v=sig.mle^2,col=3,lwd=2)
legend("topleft",legend=c("MLE","C Prior","CC prior"),col=c(3,2,1),bty="n",lwd=2)
```

# Things you had to know to understand this class

1. Covariance

2. Variance

3. Fitting a line via ordinary least squares

4. Gaussian linear model

5. Basic matrix algebra

6. Likelihood function

7. maximum likelihood estimator

8. Prior and posterior distributions

9. Inverse-gamma distribution

10. Conjugacy

11. Conditional conjugacy

12. Monte Carlo posterior simulation
