What will we cover?
In this lesson we will learn about correlation of assets, calculations of correlation, and risk and coherence.
The learning objectives of this tutorial.
- What is correlation and how to use it
- Calculate correlation
- Find negatively correlated assets
Step 1: What is Correlation
Correlation is a statistic that measures the degree to which two variables move in relation to each other. Correlation measures association, but doesn’t show if x causes y or vice versa.
The correlation between two stocks is a number form -1 to 1 (both inclusive).
- A positive correlation means, when stock x goes up, we expect stock y to go up, and opposite.
- A negative correlation means, when stock x goes up, we expect stock y to go down, and opposite.
- A zero correlation, we cannot say anything in relation to each other.
The formula for calculating the correlation is quite a mouthful.
Step 2: Calculate the Correlation with DataFrames (pandas)
Luckily, the DataFrames can calculate it for us. Hence, we do not need to master how to do it.
Let’s get started. First, we need to load some time series of historic stock prices.
import pandas as pd import pandas_datareader as pdr import datetime as dt import numpy as np tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT'] start = dt.datetime(2020, 1, 1) data = pdr.get_data_yahoo(tickers, start) data = data['Adj Close'] log_returns = np.log(data/data.shift())
Where we also calculate the log returns.
The correlation can be calculated as follows.
That was easy, right? Remember we do it on the log returns to keep it on the same range.
Symbols AAPL TWTR IBM MSFT Symbols AAPL 1.000000 0.531973 0.518204 0.829547 TWTR 0.531973 1.000000 0.386493 0.563909 IBM 0.518204 0.386493 1.000000 0.583205 MSFT 0.829547 0.563909 0.583205 1.000000
We identify, that the correlation on the diagonal is 1.0. This is obvious, since the diagonal shows the correlation between itself (AAPL and AAPL, and so forth).
Other than that, we can conclude that AAPL and MSFT are correlated the most.
Step 3: Calculate the correlation to the general market
Let’s add the S&P 500 to our DataFrame.
sp500 = pdr.get_data_yahoo("^GSPC", start) log_returns['SP500'] = np.log(sp500['Adj Close']/sp500['Adj Close'].shift()) log_returns.corr()
Resulting in this.
Where we see that AAPL and MSFT are mostly correlated to S&P 500 index. This is not surprising, as they are a big part of the weight of the market cap in the index.
Step 4: Find Negative Correlated assets when Investing using Python
We will add this helper function to help find correlations.
We are in particular interested in negative correlation here.
def test_correlation(ticker): df = pdr.get_data_yahoo(ticker, start) lr = log_returns.copy() lr[ticker] = np.log(df['Adj Close']/df['Adj Close'].shift()) return lr.corr()
This can help us find assets with a negative correlation.
Why do we wan that? Well, to minimize the risk. Read my eBook on the subject if you want to learn more about that.
Now, let’s test.
Resulting in this following.
The negative correlation we are looking for.
Step 5: Visualize the negative correlation
This can be visualized to get a better understanding as follows.
import matplotlib.pyplot as plt %matplotlib notebook def visualize_correlation(ticker1, ticker2): df = pdr.get_data_yahoo([ticker1, ticker2], start) df = df['Adj Close'] df = df/df.iloc fig, ax = plt.subplots() df.plot(ax=ax)
With visualize_correlation(“AAPL”, “TLT”) we get.
Where we see, when AAPL goes down, the TLT goes up.
And if we look at visualize_correlation(“^GSPC”, “TLT”) (the S&P 500 index and TLT).
This is part of a full FREE course with all the code available on my GitHub.