Pandas: Does Stock Market Correlate to Unemployment Rate or Bank Interest Rate?

What will we cover in this tutorial?

We will continue our exploration of the amazing Pandas-datareader. In this tutorial we will further investigate data from World Bank and correlate it with S&P 500 Stock index. We will do this both by visualizing 3 graphs on 2 different y-axis as well as compute the correlation.

Step 1: Get World Bank data

In this tutorial we will only look at data from United States. If you are interested in other tutorials on. World Bank data you should read this one and this one.

To get the data of the World Bank you can use the Pandas-datareader, which has a function to download data if you have the indicator.

pandas_datareader.wb.download(country=Noneindicator=Nonestart=2003end=2005freq=Noneerrors=’warn’**kwargs)

That takes the country and indicator, start, and end year as arguments.

You can find indicators on the webpage of World Bank.

In this tutorial we will use the SL.UEM.TOTL.ZS, the unemployment, total (% of total labor force), and FR.INR.RINR, the interest rate.

To inspect the data you can use the following code.

from pandas_datareader import wb


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)

print(data)

Giving an output similar to this (lines excluded).

        US-int  US-unempl
year                     
1990  6.039744        NaN
1991  4.915352      6.800
1992  3.884240      7.500
1993  3.546689      6.900
1994  4.898356      6.119
1995  6.594069      5.650
1996  6.324008      5.451
1997  6.603407      5.000
1998  7.148192      4.510
1999  6.457135      4.219

For details on the unstacking and transposing, see this tutorial.

Step 2: Join the data from the S&P 500 index

First let’s get the data from S&P 500, which has ticker ^GSPC.

You can use the Pandas-datareader for that.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'
print(sp500)

Resulting in the following output.

Date
1990-01-02     359.690002
1990-01-03     358.760010
1990-01-04     355.670013
1990-01-05     352.200012
1990-01-08     353.790009
                 ...     
2019-12-24    3223.379883
2019-12-26    3239.909912
2019-12-27    3240.020020
2019-12-30    3221.290039
2019-12-31    3230.780029
Name: S&P 500, Length: 7559, dtype: float64

Problem! The date is a datetime object in the above Series, while it is a string with a year in the DataFrame with unemployment rate and interest rate above.

To successfully join them, we need to convert them into same format. The best way is to convert them into a datetime. We can do that by using the pd.to_datetime() function.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from pandas_datareader import wb


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'

data = sp500.to_frame().join(data, how='outer')
print(data)

Resulting in the following output.

                S&P 500    US-int  US-unempl
1990-01-01          NaN  6.039744        NaN
1990-01-02   359.690002       NaN        NaN
1990-01-03   358.760010       NaN        NaN
1990-01-04   355.670013       NaN        NaN
1990-01-05   352.200012       NaN        NaN
...                 ...       ...        ...
2019-12-24  3223.379883       NaN        NaN
2019-12-26  3239.909912       NaN        NaN
2019-12-27  3240.020020       NaN        NaN
2019-12-30  3221.290039       NaN        NaN
2019-12-31  3230.780029       NaN        NaN

The problem you see there, is that data from US-int and US-unempl. only has data the first of January every year. To fix that, we can make a linear interpolation of the data by applying the following.

data = data.interpolate(method='linear')

Resulting in.

                S&P 500    US-int  US-unempl
1990-01-01          NaN  6.039744        NaN
1990-01-02   359.690002  6.035318        NaN
1990-01-03   358.760010  6.030891        NaN
1990-01-04   355.670013  6.026464        NaN
1990-01-05   352.200012  6.022037        NaN
...                 ...       ...        ...
2019-12-24  3223.379883  3.478200      3.682
2019-12-26  3239.909912  3.478200      3.682
2019-12-27  3240.020020  3.478200      3.682
2019-12-30  3221.290039  3.478200      3.682
2019-12-31  3230.780029  3.478200      3.682

Notice, that since there is no unemployment data for 1990 in US, it will fill them with NaN until first rate is given.

Step 3: Visualize all three graphs with 3 different y-axis

Now here Pandas are quite strong. By default, you can create a secondary y-axis. As the 3 datasets only need two y-axis, as the unemployment and interest rate can share the same y-axis.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'

data = sp500.to_frame().join(data, how='outer')
data = data.interpolate(method='linear')

ax = data['S&P 500'].plot(legend=True)
ax = data[['US-int','US-unempl']].plot(ax=ax, secondary_y=True, legend=True)

print(data.corr())

plt.show()

Where the correlation is given here.

            S&P 500    US-int  US-unempl
S&P 500    1.000000 -0.408429  -0.453315
US-int    -0.408429  1.000000  -0.470103
US-unempl -0.453315 -0.470103   1.000000

Which is surprisingly low. Visually, you can see it here.

The S&P 500 stock index, US interest rate and US unemployment rate

Pandas: Determine Correlation Between GDP and Stock Market

What will we cover in this tutorial?

In this tutorial we will explore some aspects of the Pandas-Datareader, which is an invaluable way to get data from many sources, including the World Bank and Yahoo! Finance.

In this tutorial we will investigate if the GDP of a country is correlated to the stock market.

Step 1: Get GDP data from World Bank

In the previous tutorial we looked at the GDP per capita and compared it between countries. GDP per capita is a good way to compare country’s economy between each other.

In this tutorial we will look at the GDP and using the NY.GDP.MKTP.CD indicator of GDP in current US$.

We can extract the data by using using the download function from the Pandas-datareader library.

from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)

print(gdp)

Resulting in the following output.

                    NY.GDP.MKTP.CD
country       year                
United States 2019  21427700000000
              2018  20580223000000
              2017  19485393853000
              2016  18707188235000
              2015  18219297584000
              2014  17521746534000
              2013  16784849190000
              2012  16197007349000
              2011  15542581104000

Step 2: Gathering the stock index

Then we need to gather the data from the stock market. As we look at the US stock market, the S&P 500 index is a good indicator of the market.

The ticker of S&P 500 is ^GSPC (yes, with the ^).

The Yahoo! Finance api is a great place to collect this type of data.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
print(sp500)

Resulting in the following output.

Date
1990-01-02     359.690002
1990-01-03     358.760010
1990-01-04     355.670013
1990-01-05     352.200012
1990-01-08     353.790009
                 ...     
2019-12-24    3223.379883
2019-12-26    3239.909912
2019-12-27    3240.020020
2019-12-30    3221.290039
2019-12-31    3230.780029

Step 3: Visualizing the data on one plot

A good way to see if there is a correlation is simply by visualizing it.

This can be done with a few tweaks.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)

gdp = gdp.unstack().T.reset_index(0)
gdp.index = pd.to_datetime(gdp.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']


data = sp500.to_frame().join(gdp, how='outer')
data = data.interpolate(method='linear')

ax = data['Adj Close'].plot()
ax = data['United States'].plot(ax=ax, secondary_y=True)

plt.show()

The GDP data needs to be formatted differently, by unstack’ing, transposing, and resetting the index. Then the index is converted from being strings of year to actually time series.

We use a outer join to get all the dates in the time series. Then we interpolate with a linear method to fill out the gab in the graph.

Finally, we make a plot af Adj Close of S&P 500 stock index and on of the GDP of United States, where we use the same graph, but using the secondary y-axis to plot. That means, the time series on the x-axis is the same.

The resulting graph is.

US GDP with S&P 500 index

It could look like a correlation, which is visible in the aftermath of 2008.

Step 4: Calculate a correlation

Let’s try to make some correlation calculations.

First, let’s not just rely on how US GDP correlates to the US stock market. Let us try to relate it to other countries GDP and see how they relate to the strongest economy in the world.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country=['NO', 'FR', 'US', 'GB', 'DK', 'DE', 'SE'], start=1990, end=2019)

gdp = gdp.unstack().T.reset_index(0)
gdp.index = pd.to_datetime(gdp.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']

data = sp500.to_frame().join(gdp, how='outer')
data = data.interpolate(method='linear')

print(data.corr())

Where we compare it the the GDP for some more countries to verify our hypothesis.

                Adj Close   Denmark    France   Germany    Norway    Sweden  United Kingdom  United States
Adj Close        1.000000  0.729701  0.674506  0.727289  0.653507  0.718829        0.759239       0.914303
Denmark          0.729701  1.000000  0.996500  0.986769  0.975780  0.978550        0.955674       0.926139
France           0.674506  0.996500  1.000000  0.982225  0.979767  0.974825        0.945877       0.893780
Germany          0.727289  0.986769  0.982225  1.000000  0.953131  0.972542        0.913443       0.916239
Norway           0.653507  0.975780  0.979767  0.953131  1.000000  0.978784        0.933795       0.878704
Sweden           0.718829  0.978550  0.974825  0.972542  0.978784  1.000000        0.930621       0.916530
United Kingdom   0.759239  0.955674  0.945877  0.913443  0.933795  0.930621        1.000000       0.915859
United States    0.914303  0.926139  0.893780  0.916239  0.878704  0.916530        0.915859       1.000000

Now that is interesting. The US Stock market (Adj Close) correlates the strongest with the US GDP. Not surprising.

Of the chosen countries, the Danish GDP is the second most correlated to US stock market. The GDP of the countries correlate all strongly with the US GDP. There Norway correlates the least.

Continue the exploration of World Bank data.

Pandas: Read GDP per Capita From World Bank

What will we cover in this tutorial?

Introduction to the Pandas-Datareader, which is an invaluable way to get data from many sources, including the World Bank.

In this tutorial we will cover how to get the data from GDP per capita yearly data from countries and plot them.

Step 1: Get to know World Bank as a data source

The World Bank was founded in 1944 to make loans to low-income countries, with the purpose to decrease poverty in the world (see wikipedia.org for further history).

What you might not know, World Bank has an amazing sets of data that you can either browse on their webpage or get access to directly in Python using the Pandas-Datareader.

We will take a look at how to extract the NY.GDP.PCAP.KD indicator.

The what?

I know. The GDP per capita (constant 2010 US$), as it states on the webpage.

From World Bank.

On that page you can get the GDP per capita for each country in the world back to 1960.

That is what we are going to do.

Step 2: Get the data

Reading the Pandas-datareaders World Bank documentation you fall over the following function.

pandas_datareader.wb.download(country=Noneindicator=Nonestart=2003end=2005freq=Noneerrors=’warn’**kwargs)

Where you can set the country (or countries) and indicator your want:

  • country (string or list of strings.) – all downloads data for all countries 2 or 3 character ISO country codes select individual countries (e.g.“US“,“CA“) or (e.g.“USA“,“CAN“). The codes can be mixed.The two ISO lists of countries, provided by wikipedia, are hardcoded into pandas as of 11/10/2014.
  • indicator (string or list of strings) – taken from the id field in WDIsearch()

Luckily we already have our indicator from Step 1 (NY.GDP.PCAP.KD). Then we just need to find some countries of interest.

Let’s take United States, France, Great Britain, Denmark and Norway.

from pandas_datareader import wb


dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'FR', 'GB', 'DK', 'NO'], start=1960, end=2019)

print(dat)

Resulting in the following output.

                    NY.GDP.PCAP.KD
country       year                
Denmark       2019    65147.427182
              2018    63915.468361
              2017    62733.019808
              2016    61877.976481
              2015    60402.129248
...                            ...
United States 1964    19824.587845
              1963    18999.888387
              1962    18462.935998
              1961    17671.150187
              1960    17562.592084

[300 rows x 1 columns]

Step 3: Visualize the data on a graph

We need to restructure the data in order to make a nice graph.

This can be done with unstack.

from pandas_datareader import wb
import matplotlib.pyplot as plt


dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'FR', 'GB', 'DK', 'NO'], start=1960, end=2019)

print(dat.unstack())

Which result in this output.

               NY.GDP.PCAP.KD                ...                            
year                     1960          1961  ...          2018          2019
country                                      ...                            
Denmark          20537.549556  21695.609308  ...  63915.468361  65147.427182
France           12743.925100  13203.320855  ...  43720.026351  44317.392315
Norway           23167.441740  24426.011426  ...  92119.522964  92556.321645
United Kingdom   13934.029831  14198.673562  ...  43324.049759  43688.437455
United States    17562.592084  17671.150187  ...  54795.450086  55809.007792

If we transpose this and remove the double index frames that will come, then it should be good to make a plot with.

from pandas_datareader import wb
import matplotlib.pyplot as plt


dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'FR', 'GB', 'DK', 'NO'], start=1960, end=2019)

print(dat.unstack().T.reset_index(0))

dat.unstack().T.reset_index(0).plot()
plt.title('GDP per capita')
plt.show()

Giving this output, where you can see what the Transpose (T) does.

country         level_0       Denmark  ...  United Kingdom  United States
year                                   ...                               
1960     NY.GDP.PCAP.KD  20537.549556  ...    13934.029831   17562.592084
1961     NY.GDP.PCAP.KD  21695.609308  ...    14198.673562   17671.150187
1962     NY.GDP.PCAP.KD  22747.292463  ...    14233.959944   18462.935998
1963     NY.GDP.PCAP.KD  22712.577808  ...    14816.480305   18999.888387
1964     NY.GDP.PCAP.KD  24620.461432  ...    15535.026991   19824.587845
1965     NY.GDP.PCAP.KD  25542.173921  ...    15766.195724   20831.299767
1966     NY.GDP.PCAP.KD  26032.378816  ...    15926.169851   21930.591173
1967     NY.GDP.PCAP.KD  27256.322071  ...    16282.026160   22235.415708

Giving the following output.

GDP per capita for 1960-2019.

Continue the exploration in the following tutorial.