In this lesson we will learn about Linear Regression, difference from Correlation and how to visualize Linear Regression.
The objective of this tutorial is.
Let’s first see what the similarities and difference between Linear Regression and Correlation is.
A great way to learn about relationships between variables is to compare it to random variables.
Let’s start by doing that.
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression import pandas_datareader as pdr import datetime as dt import matplotlib.pyplot as plt %matplotlib notebook X = np.random.randn(5000) Y = np.random.randn(5000) fig, ax = plt.subplots() ax.scatter(X, Y, alpha=.2)
Giving the following scatter chart.
Which shows the how two non-correlated variables look like.
To compare that to two correlated, we need some data.
tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT', '^GSPC'] start = dt.datetime(2020, 1, 1) data = pdr.get_data_yahoo(tickers, start) data = data['Adj Close'] log_returns = np.log(data/data.shift())
Let’s make a function to calculate the Liner Regression and visualize it.
def linear_regression(ticker_a, ticker_b): X = log_returns[ticker_a].iloc[1:].to_numpy().reshape(-1, 1) Y = log_returns[ticker_b].iloc[1:].to_numpy().reshape(-1, 1) lin_regr = LinearRegression() lin_regr.fit(X, Y) Y_pred = lin_regr.predict(X) alpha = lin_regr.intercept_ beta = lin_regr.coef_[0, 0] fig, ax = plt.subplots() ax.set_title("Alpha: " + str(round(alpha, 5)) + ", Beta: " + str(round(beta, 3))) ax.scatter(X, Y) ax.plot(X, Y_pred, c='r')
The function takes the two tickers and get’s the log returns in NumPy arrays. They are reshaped to fit the required format.
The the Linear Regression model (LinearRegression) is used and applied to predict values. The alpha and beta are the liner variables. Finally, we scatter plot all the points and a prediction line.
Let’s try linear_regression(“AAPL”, “^GSPC”).
Where we see the red line as the prediction line.
Other examples linear_regression(“AAPL”, “MSFT”)
And linear_regression(“AAPL”, “TWTR”).
Where it visually shows that AAPL and TWTR are not as closely correlated as the other examples.
This is part of 8 lesson and 2.5h video course with prepared Jupyter Notebooks with the Python code.
What will you learn? How to use the modulo operator to check if a number…
There are a lot of Myths out there There are lot of Myths about being…
To be honest, I am not really a great programmer - that is not what…
What will you learn? Need to setup a SQL server? You don’t need to install…