Calculate the CAPM with Python in 3 Easy Steps

What will we cover?

In this lesson we will learn about the CAPM and how to calculate it.

The objectives of this tutorial is:

  • Understand the CAPM (Capital Asset Pricing Model).
  • Beta and CAPM calculations.
  • Expected return of an investment.

Step 1: What is the CAPM?

The CAPM calculates the relationship between systematic risk and expected return. There are several assumptions behind the CAPM formula that have been shown not to hold in reality. But still, the CAPM formula is still widely used.

The formula is as follows.

Step 2: Get some data to make calculations on

Let’s get some data and calculate it.

import numpy as np
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
 
tickers = ['AAPL', 'MSFT', 'TWTR', 'IBM', '^GSPC']
start = dt.datetime(2015, 12, 1)
end = dt.datetime(2021, 1, 1)
 
data = pdr.get_data_yahoo(tickers, start, end, interval="m")
 
data = data['Adj Close']
 
log_returns = np.log(data/data.shift())

Feel free to change the tickers to your choice and remember to update the dates to fit your purpose.

Step 3: How to calculate CAPM with Python (NumPy and pandas)

The calculations are done quite easily.

Again, when we look at the formula, the risk free return is often set to 0. Otherwise, the 10 years treasury note is used. Here, we use 1.38%. You can update it for more up to date value with the link.

cov = log_returns.cov()
var = log_returns['^GSPC'].var()
 
beta = cov.loc['AAPL', '^GSPC']/var
 
risk_free_return = 0.0138
market_return = .105
expected_return = risk_free_return + beta*(market_return - risk_free_return)

Notice, you can calculate it all simultaneously.

Want more?

This is part of a 2.5 hour full video course in 8 parts about Risk and Return.

Calculate the market (S&P 500) BETA with Python for any Stock

What will we cover?

In this lesson we will learn about market Beta with S&P 500 index, how to calculate it, and comparison of calculations from last lesson.

The objective of the tutorial is:

  • Understand what market Beta tells you.
  • How to calculate the market (S&P 500) Beta.
  • See how Beta is related with Linear Regression.

Step 1: What is BETA and how to interpret the value

Beta is a measure of a stock’s volatility in relation to the overall market (S&P 500). The S&P 500 index has Beta 1.

High-beta stocks are supposed to be riskier but provide higher potential return. While, low-beta stocks pose less risk but also lower returns.

Interpretation

  • Beta above 1: stock is more volatile than the market, but expects higher return.
  • Beta below 1: stock with lower volatility, and expects less return.

The formula for Beta is Covariance divided by variance.

This sound more scary than it is.

The Beta on financial pages, like Yahoo! Finance, are calculated on the monthly price.

Step 2: Get some historic stock prices with Pandas Datareader

Let’s make an example here.

import numpy as np
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
from sklearn.linear_model import LinearRegression
 
tickers = ['AAPL', 'MSFT', 'TWTR', 'IBM', '^GSPC']
start = dt.datetime(2015, 12, 1)
end = dt.datetime(2021, 1, 1)
 
data = pdr.get_data_yahoo(tickers, start, end, interval="m")
 
data = data['Adj Close']
 
log_returns = np.log(data/data.shift())

Where we notice that we read data on interval=”m”, which gives the monthly data.

Step 3: Calculate the BETA

Then the Beta is calculated as follows.

cov = log_returns.cov()
var = log_returns['^GSPC'].var()
 
cov.loc['AAPL', '^GSPC']/var

For Apple, it was 1.25.

If you wonder if it is related to the Beta value from Linear Regression. Let’s check it out.

X = log_returns['^GSPC'].iloc[1:].to_numpy().reshape(-1, 1)
Y = log_returns['AAPL'].iloc[1:].to_numpy().reshape(-1, 1)
 
lin_regr = LinearRegression()
lin_regr.fit(X, Y)
 
lin_regr.coef_[0, 0]

Also giving 1.25. Hence, it is the same calculation behind it.

Want more?

This is part of a 2.5 hours in 8 lessons video course about Risk and Return.

How to use Linear Regression to Calculate the Beta to the General Market (S&P 500)

What will we cover?

In this lesson we will learn about Linear Regression, difference from Correlation and how to visualize Linear Regression.

The objective of this tutorial is.

  • Understand the difference between Linear Regression and Correlation.
  • Understand the difference between true random and correlated variables
  • Visualize linear regression.

Step 1: Similarities and differences between linear regression and correlation

Let’s first see what the similarities and difference between Linear Regression and Correlation is.

Similarities.

  • Quantify the direction and strength of the relationship between two variables, here we look at stock prices.

Differences.

  • Correlation is a single statistic. It is just a number between -1 and 1 (both inclusive).
  • Linear regression produces an equation.

Step 2: Visualize data with no correlation

A great way to learn about relationships between variables is to compare it to random variables.

Let’s start by doing that.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
%matplotlib notebook
 
X = np.random.randn(5000)
Y = np.random.randn(5000)
 
fig, ax = plt.subplots()
ax.scatter(X, Y, alpha=.2)

Giving the following scatter chart.

Which shows the how two non-correlated variables look like.

Step 3: How to visualize correlated stock prices

To compare that to two correlated, we need some data.

tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT', '^GSPC']
start = dt.datetime(2020, 1, 1)
 
data = pdr.get_data_yahoo(tickers, start)
data = data['Adj Close']
log_returns = np.log(data/data.shift())

Let’s make a function to calculate the Liner Regression and visualize it.

def linear_regression(ticker_a, ticker_b):
    X = log_returns[ticker_a].iloc[1:].to_numpy().reshape(-1, 1)
    Y = log_returns[ticker_b].iloc[1:].to_numpy().reshape(-1, 1)
 
    lin_regr = LinearRegression()
    lin_regr.fit(X, Y)
 
    Y_pred = lin_regr.predict(X)
 
    alpha = lin_regr.intercept_[0]
    beta = lin_regr.coef_[0, 0]
 
    fig, ax = plt.subplots()
    ax.set_title("Alpha: " + str(round(alpha, 5)) + ", Beta: " + str(round(beta, 3)))
    ax.scatter(X, Y)
    ax.plot(X, Y_pred, c='r')

The function takes the two tickers and get’s the log returns in NumPy arrays. They are reshaped to fit the required format.

The the Linear Regression model (LinearRegression) is used and applied to predict values. The alpha and beta are the liner variables. Finally, we scatter plot all the points and a prediction line.

Let’s try linear_regression(“AAPL”, “^GSPC”).

Where we see the red line as the prediction line.

Step 4: A few more examples

Other examples linear_regression(“AAPL”, “MSFT”)

And linear_regression(“AAPL”, “TWTR”).

Where it visually shows that AAPL and TWTR are not as closely correlated as the other examples.

Want more?

This is part of 8 lesson and 2.5h video course with prepared Jupyter Notebooks with the Python code.

How to Calculate Correlation between Stock Price Movements with Python

What will we cover?

In this lesson we will learn about correlation of assets, calculations of correlation, and risk and coherence.

The learning objectives of this tutorial.

  • What is correlation and how to use it
  • Calculate correlation
  • Find negatively correlated assets

Step 1: What is Correlation

Correlation is a statistic that measures the degree to which two variables move in relation to each other. Correlation measures association, but doesn’t show if x causes y or vice versa.

The correlation between two stocks is a number form -1 to 1 (both inclusive).

  • A positive correlation means, when stock x goes up, we expect stock y to go up, and opposite.
  • A negative correlation means, when stock x goes up, we expect stock y to go down, and opposite.
  • A zero correlation, we cannot say anything in relation to each other.

The formula for calculating the correlation is quite a mouthful.

Step 2: Calculate the Correlation with DataFrames (pandas)

Luckily, the DataFrames can calculate it for us. Hence, we do not need to master how to do it.

Let’s get started. First, we need to load some time series of historic stock prices.

See this tutorial on how to work with portfolios.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np
 
tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT']
start = dt.datetime(2020, 1, 1)
 
data = pdr.get_data_yahoo(tickers, start)
data = data['Adj Close']
 
log_returns = np.log(data/data.shift())

Where we also calculate the log returns.

The correlation can be calculated as follows.

log_returns.corr()

That was easy, right? Remember we do it on the log returns to keep it on the same range.

Symbols AAPL    TWTR    IBM MSFT
Symbols             
AAPL    1.000000    0.531973    0.518204    0.829547
TWTR    0.531973    1.000000    0.386493    0.563909
IBM 0.518204    0.386493    1.000000    0.583205
MSFT    0.829547    0.563909    0.583205    1.000000

We identify, that the correlation on the diagonal is 1.0. This is obvious, since the diagonal shows the correlation between itself (AAPL and AAPL, and so forth).

Other than that, we can conclude that AAPL and MSFT are correlated the most.

Step 3: Calculate the correlation to the general market

Let’s add the S&P 500 to our DataFrame.

sp500 = pdr.get_data_yahoo("^GSPC", start)
 
log_returns['SP500'] = np.log(sp500['Adj Close']/sp500['Adj Close'].shift())
 
log_returns.corr()

Resulting in this.

Where we see that AAPL and MSFT are mostly correlated to S&P 500 index. This is not surprising, as they are a big part of the weight of the market cap in the index.

Step 4: Find Negative Correlated assets when Investing using Python

We will add this helper function to help find correlations.

We are in particular interested in negative correlation here.

def test_correlation(ticker):
    df = pdr.get_data_yahoo(ticker, start)
    lr = log_returns.copy()
    lr[ticker] = np.log(df['Adj Close']/df['Adj Close'].shift())
    return lr.corr()

This can help us find assets with a negative correlation.

Why do we wan that? Well, to minimize the risk. Read my eBook on the subject if you want to learn more about that.

Now, let’s test.

test_correlation("TLT")

Resulting in this following.

The negative correlation we are looking for.

Step 5: Visualize the negative correlation

This can be visualized to get a better understanding as follows.

import matplotlib.pyplot as plt
%matplotlib notebook
 
def visualize_correlation(ticker1, ticker2):
    df = pdr.get_data_yahoo([ticker1, ticker2], start)
    df = df['Adj Close']
    df = df/df.iloc[0]
    fig, ax = plt.subplots()
    df.plot(ax=ax)

With visualize_correlation(“AAPL”, “TLT”) we get.

Where we see, when AAPL goes down, the TLT goes up.

And if we look at visualize_correlation(“^GSPC”, “TLT”) (the S&P 500 index and TLT).

What next?

Want more?

This is part of a full FREE course with all the code available on my GitHub.