Python # How to use Linear Regression to Calculate the Beta to the General Market (S&P 500)

In this lesson we will learn about **Linear Regression, **difference from **Correlation** and how to visualize **Linear Regression**.

The objective of this tutorial is.

- Understand the difference between
**Linear Regression**and**Correlation**. - Understand the difference between
**true random**and**correlated**variables **Visualize linear regression.**

Let’s first see what the similarities and difference between Linear Regression and Correlation is.

**Similarities**.

- Quantify the direction and strength of the relationship between two variables, here we look at stock prices.

**Differences**.

- Correlation is a single statistic. It is just a number between -1 and 1 (both inclusive).
- Linear regression produces an equation.

A great way to learn about relationships between variables is to compare it to random variables.

Let’s start by doing that.

```
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
%matplotlib notebook
X = np.random.randn(5000)
Y = np.random.randn(5000)
fig, ax = plt.subplots()
ax.scatter(X, Y, alpha=.2)
```

Giving the following scatter chart.

Which shows the how two non-correlated variables look like.

To compare that to two correlated, we need some data.

```
tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT', '^GSPC']
start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo(tickers, start)
data = data['Adj Close']
log_returns = np.log(data/data.shift())
```

Let’s make a function to calculate the Liner Regression and visualize it.

```
def linear_regression(ticker_a, ticker_b):
X = log_returns[ticker_a].iloc[1:].to_numpy().reshape(-1, 1)
Y = log_returns[ticker_b].iloc[1:].to_numpy().reshape(-1, 1)
lin_regr = LinearRegression()
lin_regr.fit(X, Y)
Y_pred = lin_regr.predict(X)
alpha = lin_regr.intercept_[0]
beta = lin_regr.coef_[0, 0]
fig, ax = plt.subplots()
ax.set_title("Alpha: " + str(round(alpha, 5)) + ", Beta: " + str(round(beta, 3)))
ax.scatter(X, Y)
ax.plot(X, Y_pred, c='r')
```

The function takes the two tickers and get’s the log returns in **NumPy** arrays. They are reshaped to fit the required format.

The the Linear Regression model (**LinearRegression**) is used and applied to predict values. The alpha and beta are the liner variables. Finally, we scatter plot all the points and a prediction line.

Let’s try** linear_regression(“AAPL”, “^GSPC”)**.

Where we see the red line as the prediction line.

Other examples **linear_regression(“AAPL”, “MSFT”)**

And **linear_regression(“AAPL”, “TWTR”)**.

Where it visually shows that **AAPL** and **TWTR** are not as closely correlated as the other examples.

This is part of 8 lesson and 2.5h video course with prepared Jupyter Notebooks with the Python code.

Why learn Python? There are many reasons to learn Python, and that is the power…

3 days ago

What will you learn? How to use the modulo operator to check if a number…

1 week ago

There are a lot of Myths out there There are lot of Myths about being…

2 months ago

To be honest, I am not really a great programmer - that is not what…

2 months ago

What does it take to become a Data Scientist? Data Science is in a cross…

2 months ago

What will you learn? Need to setup a SQL server? You don’t need to install…

4 months ago