Visualize Why Long-term Investing is Less Risky – Pandas and Matplotlib

What will we cover in this tutorial?

We will look at how you can use Pandas Datareader (Pandas) and Matplotlib to create a visualization of why long-term investing is less risky.

Here risk is simply meaning the risk of loosing money.

Specifically, we will investigate how likely it is to loose money (and how much) if you invest for a 1 year perspective vs a 10 year perspective.

Step 1: Establish the data for the investigation

One of the most widely used index is the S&P 500 index. This index lists 500 large companies on the US market exchange and is one of the most commonly followed equity indices.

We will use this index and retrieve data back from 1970 and up until today.

This can be done as follow.

import pandas_datareader as pdr
from datetime import datetime

data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

Then the DataFrame data will contain all data from 1970 up until today. The ^GSPC is the ticker for the S&P 500 index.

Step 2: Calculate the annual return from 1970 and forward using Pandas

The annual return for a year is calculated by taking the last trading value of the divided by the first day and subtracting 1, then multiply that by 100 to get it in percentage.

Calculating it for all years then you can visualize it with a histogram as follows.

import pandas as pd
import pandas_datareader as pdr
from datetime import datetime
import matplotlib.pyplot as plt


data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

years = []
annual_return = []

for year in range(1970, 2021):
    years.append(year)
    data_year = data.loc[f'{year}']['Adj Close']
    annual_return.append((data_year.iloc[-1] / data_year.iloc[0] - 1) * 100)

df = pd.DataFrame(annual_return, index=years)
bins = [i for i in range(-40, 45, 5)]
df.plot.hist(bins=bins, title='1 year')
plt.show()

Notice that we create a new DataFrame with all the annual returns for each of the years and use it to make a histogram.

The result is as follows.

What you see is a histogram indicating how many years a given annual return was occurring.

Hence, a -40-35% (negative) return occurred once, while a 0-5% return happened 6 times in the span of years from 1970 to 2020 (inclusive).

What does this tell us?

Well, you can lose up to 40%, but you can also gain up to 35% in one year. It also shows you that it is more likely to gain (positive return) than lose.

But what if we invested the money for 10 years.

Step 3: Calculate the average annual return in 10 years spans starting from 1970 using Pandas

This is actually quite similar, but with a few changes.

First of all, the average return is calculated using the CAGR (Compound Annual Growth Rate) formula.

This results in the following code.

import pandas as pd
import pandas_datareader as pdr
from datetime import datetime
import matplotlib.pyplot as plt


data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

years = []
avg_annual_return = []
for year in range(1970, 2011):
    years.append(year)
    data_year = data.loc[f'{year}':f'{year + 9}']['Adj Close']
    avg_annual_return.append(((data_year.iloc[-1] / data_year.iloc[0]) ** (1 / 10) - 1) * 100)

df = pd.DataFrame(avg_annual_return, index=years)
bins = [i for i in range(-40, 45, 5)]
df.plot.hist(bins=bins, title='10 years')
plt.show()

There are a few changes. One is the formula for the average annual return (as stated above) and the other is that we use 10 years of data. Notice, that we only add 9 to the year. This is because that both years are inclusive.

This results in this histogram.

As you see. One in 3 cases there was a negative return over the a 10 year span. Also, the loss was only in the range -5-0%. Otherwise, the return would be positive.

Now is that nice?

Fix get_data_yahoo from Pandas Datareader

What will we cover?

If you use get_data_yahoo from Pandas Datareader and it suddenly stopped working, then we will look at how to fix.

The Error and Problem

Consider this code.

import pandas_datareader as pdr
from datetime import datetime

data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))

It has been working up until now. But suddenly it writes.

Traceback (most recent call last):
  File "/Users/rune/PycharmProjects/TEST/test_yahoo.py", line 4, in <module>
    data = pdr.get_data_yahoo('^GSPC', datetime(1970, 1, 1))
  File "/Users/rune/PycharmProjects/TEST/venv/lib/python3.8/site-packages/pandas_datareader/data.py", line 86, in get_data_yahoo
    return YahooDailyReader(*args, **kwargs).read()
  File "/Users/rune/PycharmProjects/TEST/venv/lib/python3.8/site-packages/pandas_datareader/base.py", line 253, in read
    df = self._read_one_data(self.url, params=self._get_params(self.symbols))
  File "/Users/rune/PycharmProjects/TEST/venv/lib/python3.8/site-packages/pandas_datareader/yahoo/daily.py", line 153, in _read_one_data
    resp = self._get_response(url, params=params)
  File "/Users/rune/PycharmProjects/TEST/venv/lib/python3.8/site-packages/pandas_datareader/base.py", line 181, in _get_response
    raise RemoteDataError(msg)
pandas_datareader._utils.RemoteDataError: Unable to read URL: https://finance.yahoo.com/quote/^GSPC/history?period1=10800&period2=1627523999&interval=1d&frequency=1d&filter=history
Response Text:
b'<!DOCTYPE html>\n  <html lang="en-us"><head>\n  <meta http-equiv="content-type" content="text/html; charset=UTF-8">\n      <meta charset="utf-8">\n      <title>Yahoo</title>\n      <meta name="viewport" content="width=device-width,initial-scale=1,minimal-ui">\n      <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n      <style>\n  html {\n      height: 100%;\n  }\n  body {\n      background: #fafafc url(https://s.yimg.com/nn/img/sad-panda-201402200631.png) 50% 50%;\n      background-size: cover;\n      height: 100%;\n      text-align: center;\n      font: 300 18px "helvetica neue", helvetica, verdana, tahoma, arial, sans-serif;\n  }\n  table {\n      height: 100%;\n      width: 100%;\n      table-layout: fixed;\n      border-collapse: collapse;\n      border-spacing: 0;\n      border: none;\n  }\n  h1 {\n      font-size: 42px;\n      font-weight: 400;\n      color: #400090;\n  }\n  p {\n      color: #1A1A1A;\n  }\n  #message-1 {\n      font-weight: bold;\n      margin: 0;\n  }\n  #message-2 {\n      display: inline-block;\n      *display: inline;\n      zoom: 1;\n      max-width: 17em;\n      _width: 17em;\n  }\n      </style>\n  <script>\n    document.write(\'<img src="//geo.yahoo.com/b?s=1197757129&t=\'+new Date().getTime()+\'&src=aws&err_url=\'+encodeURIComponent(document.URL)+\'&err=%<pssc>&test=\'+encodeURIComponent(\'%<{Bucket}cqh[:200]>\')+\'" width="0px" height="0px"/>\');var beacon = new Image();beacon.src="//bcn.fp.yahoo.com/p?s=1197757129&t="+new Date().getTime()+"&src=aws&err_url="+encodeURIComponent(document.URL)+"&err=%<pssc>&test="+encodeURIComponent(\'%<{Bucket}cqh[:200]>\');\n  </script>\n  </head>\n  <body>\n  <!-- status code : 404 -->\n  <!-- Not Found on Server -->\n  <table>\n  <tbody><tr>\n      <td>\n      <img src="https://s.yimg.com/rz/p/yahoo_frontpage_en-US_s_f_p_205x58_frontpage.png" alt="Yahoo Logo">\n      <h1 style="margin-top:20px;">Will be right back...</h1>\n      <p id="message-1">Thank you for your patience.</p>\n      <p id="message-2">Our engineers are working quickly to resolve the issue.</p>\n      </td>\n  </tr>\n  </tbody></table>\n  </body></html>'

What to do?

The fix

There has been a breaking change and you need to update your Pandas Datareader.

You can upgrade to the newest version as follows.

pip install pandas_datareader --upgrade

It should update it to version 0.10.0 or later.

Then the code should work again.