Visualize Why Long-term Investing is Less Risky – Pandas and Matplotlib

What will we cover in this tutorial?

We will look at how you can use Pandas Datareader (Pandas) and Matplotlib to create a visualization of why long-term investing is less risky.

Here risk is simply meaning the risk of loosing money.

Specifically, we will investigate how likely it is to loose money (and how much) if you invest for a 1 year perspective vs a 10 year perspective.

Step 1: Establish the data for the investigation

One of the most widely used index is the S&P 500 index. This index lists 500 large companies on the US market exchange and is one of the most commonly followed equity indices.

We will use this index and retrieve data back from 1970 and up until today.

This can be done as follow.

import pandas_datareader as pdr
from datetime import datetime

data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

Then the DataFrame data will contain all data from 1970 up until today. The ^GSPC is the ticker for the S&P 500 index.

Step 2: Calculate the annual return from 1970 and forward using Pandas

The annual return for a year is calculated by taking the last trading value of the divided by the first day and subtracting 1, then multiply that by 100 to get it in percentage.

Calculating it for all years then you can visualize it with a histogram as follows.

import pandas as pd
import pandas_datareader as pdr
from datetime import datetime
import matplotlib.pyplot as plt


data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

years = []
annual_return = []

for year in range(1970, 2021):
    years.append(year)
    data_year = data.loc[f'{year}']['Adj Close']
    annual_return.append((data_year.iloc[-1] / data_year.iloc[0] - 1) * 100)

df = pd.DataFrame(annual_return, index=years)
bins = [i for i in range(-40, 45, 5)]
df.plot.hist(bins=bins, title='1 year')
plt.show()

Notice that we create a new DataFrame with all the annual returns for each of the years and use it to make a histogram.

The result is as follows.

What you see is a histogram indicating how many years a given annual return was occurring.

Hence, a -40-35% (negative) return occurred once, while a 0-5% return happened 6 times in the span of years from 1970 to 2020 (inclusive).

What does this tell us?

Well, you can lose up to 40%, but you can also gain up to 35% in one year. It also shows you that it is more likely to gain (positive return) than lose.

But what if we invested the money for 10 years.

Step 3: Calculate the average annual return in 10 years spans starting from 1970 using Pandas

This is actually quite similar, but with a few changes.

First of all, the average return is calculated using the CAGR (Compound Annual Growth Rate) formula.

This results in the following code.

import pandas as pd
import pandas_datareader as pdr
from datetime import datetime
import matplotlib.pyplot as plt


data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

years = []
avg_annual_return = []
for year in range(1970, 2011):
    years.append(year)
    data_year = data.loc[f'{year}':f'{year + 9}']['Adj Close']
    avg_annual_return.append(((data_year.iloc[-1] / data_year.iloc[0]) ** (1 / 10) - 1) * 100)

df = pd.DataFrame(avg_annual_return, index=years)
bins = [i for i in range(-40, 45, 5)]
df.plot.hist(bins=bins, title='10 years')
plt.show()

There are a few changes. One is the formula for the average annual return (as stated above) and the other is that we use 10 years of data. Notice, that we only add 9 to the year. This is because that both years are inclusive.

This results in this histogram.

As you see. One in 3 cases there was a negative return over the a 10 year span. Also, the loss was only in the range -5-0%. Otherwise, the return would be positive.

Now is that nice?

Leave a Reply