How to use Linear Regression to Calculate the Beta to the General Market (S&P 500)

What will we cover?

In this lesson we will learn about Linear Regression, difference from Correlation and how to visualize Linear Regression.

The objective of this tutorial is.

  • Understand the difference between Linear Regression and Correlation.
  • Understand the difference between true random and correlated variables
  • Visualize linear regression.

Step 1: Similarities and differences between linear regression and correlation

Let’s first see what the similarities and difference between Linear Regression and Correlation is.


  • Quantify the direction and strength of the relationship between two variables, here we look at stock prices.


  • Correlation is a single statistic. It is just a number between -1 and 1 (both inclusive).
  • Linear regression produces an equation.

Step 2: Visualize data with no correlation

A great way to learn about relationships between variables is to compare it to random variables.

Let’s start by doing that.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
%matplotlib notebook
X = np.random.randn(5000)
Y = np.random.randn(5000)
fig, ax = plt.subplots()
ax.scatter(X, Y, alpha=.2)

Giving the following scatter chart.

Which shows the how two non-correlated variables look like.

Step 3: How to visualize correlated stock prices

To compare that to two correlated, we need some data.

tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT', '^GSPC']
start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo(tickers, start)
data = data['Adj Close']
log_returns = np.log(data/data.shift())

Let’s make a function to calculate the Liner Regression and visualize it.

def linear_regression(ticker_a, ticker_b):
    X = log_returns[ticker_a].iloc[1:].to_numpy().reshape(-1, 1)
    Y = log_returns[ticker_b].iloc[1:].to_numpy().reshape(-1, 1)
    lin_regr = LinearRegression(), Y)
    Y_pred = lin_regr.predict(X)
    alpha = lin_regr.intercept_[0]
    beta = lin_regr.coef_[0, 0]
    fig, ax = plt.subplots()
    ax.set_title("Alpha: " + str(round(alpha, 5)) + ", Beta: " + str(round(beta, 3)))
    ax.scatter(X, Y)
    ax.plot(X, Y_pred, c='r')

The function takes the two tickers and get’s the log returns in NumPy arrays. They are reshaped to fit the required format.

The the Linear Regression model (LinearRegression) is used and applied to predict values. The alpha and beta are the liner variables. Finally, we scatter plot all the points and a prediction line.

Let’s try linear_regression(“AAPL”, “^GSPC”).

Where we see the red line as the prediction line.

Step 4: A few more examples

Other examples linear_regression(“AAPL”, “MSFT”)

And linear_regression(“AAPL”, “TWTR”).

Where it visually shows that AAPL and TWTR are not as closely correlated as the other examples.

Want more?

This is part of 8 lesson and 2.5h video course with prepared Jupyter Notebooks with the Python code.


Published by

Recent Posts

Learn Python FREE Online

Why learn Python? There are many reasons to learn Python, and that is the power…

3 days ago

How to Check if a Number is Even or Odd with Python

What will you learn? How to use the modulo operator to check if a number…

1 week ago

The Truth About Being a Python Software Contractor

There are a lot of Myths out there There are lot of Myths about being…

2 months ago

Do This and 10X Your Salary as a Software Engineer

To be honest, I am not really a great programmer - that is not what…

2 months ago

Ultimate Guide to the Data Science Career Path

What does it take to become a Data Scientist? Data Science is in a cross…

2 months ago

How to Setup a MySQL Server in Docker for Your Python Project

What will you learn? Need to setup a SQL server? You don’t need to install…

4 months ago