What will we cover in this tutorial?
We will continue our exploration of the amazing Pandas-datareader. In this tutorial we will further investigate data from World Bank and correlate it with S&P 500 Stock index. We will do this both by visualizing 3 graphs on 2 different y-axis as well as compute the correlation.
Step 1: Get World Bank data
In this tutorial we will only look at data from United States. If you are interested in other tutorials on. World Bank data you should read this one and this one.
To get the data of the World Bank you can use the Pandas-datareader, which has a function to download data if you have the indicator.
p
andas_datareader.wb.download
(country=None, indicator=None, start=2003, end=2005, freq=None, errors=’warn’, **kwargs)
That takes the country and indicator, start, and end year as arguments.
You can find indicators on the webpage of World Bank.
In this tutorial we will use the SL.UEM.TOTL.ZS, the unemployment, total (% of total labor force), and FR.INR.RINR, the interest rate.
To inspect the data you can use the following code.
from pandas_datareader import wb
data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']
data = int_data.join(uem_data)
print(data)
Giving an output similar to this (lines excluded).
US-int US-unempl
year
1990 6.039744 NaN
1991 4.915352 6.800
1992 3.884240 7.500
1993 3.546689 6.900
1994 4.898356 6.119
1995 6.594069 5.650
1996 6.324008 5.451
1997 6.603407 5.000
1998 7.148192 4.510
1999 6.457135 4.219
For details on the unstacking and transposing, see this tutorial.
Step 2: Join the data from the S&P 500 index
First let’s get the data from S&P 500, which has ticker ^GSPC.
You can use the Pandas-datareader for that.
import pandas_datareader as pdr
import datetime as dt
start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'
print(sp500)
Resulting in the following output.
Date
1990-01-02 359.690002
1990-01-03 358.760010
1990-01-04 355.670013
1990-01-05 352.200012
1990-01-08 353.790009
...
2019-12-24 3223.379883
2019-12-26 3239.909912
2019-12-27 3240.020020
2019-12-30 3221.290039
2019-12-31 3230.780029
Name: S&P 500, Length: 7559, dtype: float64
Problem! The date is a datetime object in the above Series, while it is a string with a year in the DataFrame with unemployment rate and interest rate above.
To successfully join them, we need to convert them into same format. The best way is to convert them into a datetime. We can do that by using the pd.to_datetime() function.
import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from pandas_datareader import wb
data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']
data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')
start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'
data = sp500.to_frame().join(data, how='outer')
print(data)
Resulting in the following output.
S&P 500 US-int US-unempl
1990-01-01 NaN 6.039744 NaN
1990-01-02 359.690002 NaN NaN
1990-01-03 358.760010 NaN NaN
1990-01-04 355.670013 NaN NaN
1990-01-05 352.200012 NaN NaN
... ... ... ...
2019-12-24 3223.379883 NaN NaN
2019-12-26 3239.909912 NaN NaN
2019-12-27 3240.020020 NaN NaN
2019-12-30 3221.290039 NaN NaN
2019-12-31 3230.780029 NaN NaN
The problem you see there, is that data from US-int and US-unempl. only has data the first of January every year. To fix that, we can make a linear interpolation of the data by applying the following.
data = data.interpolate(method='linear')
Resulting in.
S&P 500 US-int US-unempl
1990-01-01 NaN 6.039744 NaN
1990-01-02 359.690002 6.035318 NaN
1990-01-03 358.760010 6.030891 NaN
1990-01-04 355.670013 6.026464 NaN
1990-01-05 352.200012 6.022037 NaN
... ... ... ...
2019-12-24 3223.379883 3.478200 3.682
2019-12-26 3239.909912 3.478200 3.682
2019-12-27 3240.020020 3.478200 3.682
2019-12-30 3221.290039 3.478200 3.682
2019-12-31 3230.780029 3.478200 3.682
Notice, that since there is no unemployment data for 1990 in US, it will fill them with NaN until first rate is given.
Step 3: Visualize all three graphs with 3 different y-axis
Now here Pandas are quite strong. By default, you can create a secondary y-axis. As the 3 datasets only need two y-axis, as the unemployment and interest rate can share the same y-axis.
import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)
data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']
data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')
start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&P 500'
data = sp500.to_frame().join(data, how='outer')
data = data.interpolate(method='linear')
ax = data['S&P 500'].plot(legend=True)
ax = data[['US-int','US-unempl']].plot(ax=ax, secondary_y=True, legend=True)
print(data.corr())
plt.show()
Where the correlation is given here.
S&P 500 US-int US-unempl
S&P 500 1.000000 -0.408429 -0.453315
US-int -0.408429 1.000000 -0.470103
US-unempl -0.453315 -0.470103 1.000000
Which is surprisingly low. Visually, you can see it here.

Python Circle
Do you know what the 5 key success factors every programmer must have?
How is it possible that some people become programmer so fast?
While others struggle for years and still fail.
Not only do they learn python 10 times faster they solve complex problems with ease.
What separates them from the rest?
I identified these 5 success factors that every programmer must have to succeed:
- Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
- Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
- Support: receive feedback on your work and ask questions without feeling intimidated or judged.
- Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
- Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.
I know how important these success factors are for growth and progress in mastering Python.
That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.
With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

Be part of something bigger and join the Python Circle community.