Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Pandas: Does Stock Market Correlate to Unemployment Rate or Bank Interest Rate?

    What will we cover in this tutorial?

    We will continue our exploration of the amazing Pandas-datareader. In this tutorial we will further investigate data from World Bank and correlate it with S&P 500 Stock index. We will do this both by visualizing 3 graphs on 2 different y-axis as well as compute the correlation.

    Step 1: Get World Bank data

    In this tutorial we will only look at data from United States. If you are interested in other tutorials on. World Bank data you should read this one and this one.

    To get the data of the World Bank you can use the Pandas-datareader, which has a function to download data if you have the indicator.

    pandas_datareader.wb.download(country=Noneindicator=Nonestart=2003end=2005freq=Noneerrors=’warn’**kwargs)

    That takes the country and indicator, start, and end year as arguments.

    You can find indicators on the webpage of World Bank.

    In this tutorial we will use the SL.UEM.TOTL.ZS, the unemployment, total (% of total labor force), and FR.INR.RINR, the interest rate.

    To inspect the data you can use the following code.

    from pandas_datareader import wb
    
    data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
    uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
    uem_data.columns = ['US-unempl']
    int_data = data.unstack().T.loc['FR.INR.RINR']
    int_data.columns = ['US-int']
    data = int_data.join(uem_data)
    print(data)
    

    Giving an output similar to this (lines excluded).

            US-int  US-unempl
    year                     
    1990  6.039744        NaN
    1991  4.915352      6.800
    1992  3.884240      7.500
    1993  3.546689      6.900
    1994  4.898356      6.119
    1995  6.594069      5.650
    1996  6.324008      5.451
    1997  6.603407      5.000
    1998  7.148192      4.510
    1999  6.457135      4.219
    

    For details on the unstacking and transposing, see this tutorial.

    Step 2: Join the data from the S&P 500 index

    First let’s get the data from S&P 500, which has ticker ^GSPC.

    You can use the Pandas-datareader for that.

    import pandas_datareader as pdr
    import datetime as dt
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    sp500.name='S&P 500'
    print(sp500)
    

    Resulting in the following output.

    Date
    1990-01-02     359.690002
    1990-01-03     358.760010
    1990-01-04     355.670013
    1990-01-05     352.200012
    1990-01-08     353.790009
                     ...     
    2019-12-24    3223.379883
    2019-12-26    3239.909912
    2019-12-27    3240.020020
    2019-12-30    3221.290039
    2019-12-31    3230.780029
    Name: S&P 500, Length: 7559, dtype: float64
    

    Problem! The date is a datetime object in the above Series, while it is a string with a year in the DataFrame with unemployment rate and interest rate above.

    To successfully join them, we need to convert them into same format. The best way is to convert them into a datetime. We can do that by using the pd.to_datetime() function.

    import pandas_datareader as pdr
    import pandas as pd
    import datetime as dt
    from pandas_datareader import wb
    
    data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
    uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
    uem_data.columns = ['US-unempl']
    int_data = data.unstack().T.loc['FR.INR.RINR']
    int_data.columns = ['US-int']
    data = int_data.join(uem_data)
    data.index = pd.to_datetime(data.index, format='%Y')
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    sp500.name='S&P 500'
    data = sp500.to_frame().join(data, how='outer')
    print(data)
    

    Resulting in the following output.

                    S&P 500    US-int  US-unempl
    1990-01-01          NaN  6.039744        NaN
    1990-01-02   359.690002       NaN        NaN
    1990-01-03   358.760010       NaN        NaN
    1990-01-04   355.670013       NaN        NaN
    1990-01-05   352.200012       NaN        NaN
    ...                 ...       ...        ...
    2019-12-24  3223.379883       NaN        NaN
    2019-12-26  3239.909912       NaN        NaN
    2019-12-27  3240.020020       NaN        NaN
    2019-12-30  3221.290039       NaN        NaN
    2019-12-31  3230.780029       NaN        NaN
    

    The problem you see there, is that data from US-int and US-unempl. only has data the first of January every year. To fix that, we can make a linear interpolation of the data by applying the following.

    data = data.interpolate(method='linear')
    

    Resulting in.

                    S&P 500    US-int  US-unempl
    1990-01-01          NaN  6.039744        NaN
    1990-01-02   359.690002  6.035318        NaN
    1990-01-03   358.760010  6.030891        NaN
    1990-01-04   355.670013  6.026464        NaN
    1990-01-05   352.200012  6.022037        NaN
    ...                 ...       ...        ...
    2019-12-24  3223.379883  3.478200      3.682
    2019-12-26  3239.909912  3.478200      3.682
    2019-12-27  3240.020020  3.478200      3.682
    2019-12-30  3221.290039  3.478200      3.682
    2019-12-31  3230.780029  3.478200      3.682
    

    Notice, that since there is no unemployment data for 1990 in US, it will fill them with NaN until first rate is given.

    Step 3: Visualize all three graphs with 3 different y-axis

    Now here Pandas are quite strong. By default, you can create a secondary y-axis. As the 3 datasets only need two y-axis, as the unemployment and interest rate can share the same y-axis.

    import pandas_datareader as pdr
    import pandas as pd
    import matplotlib.pyplot as plt
    import datetime as dt
    from pandas_datareader import wb
    
    pd.set_option('display.max_rows', 300)
    pd.set_option('display.max_columns', 10)
    pd.set_option('display.width', 1000)
    
    data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)
    uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
    uem_data.columns = ['US-unempl']
    int_data = data.unstack().T.loc['FR.INR.RINR']
    int_data.columns = ['US-int']
    data = int_data.join(uem_data)
    data.index = pd.to_datetime(data.index, format='%Y')
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    sp500.name='S&P 500'
    data = sp500.to_frame().join(data, how='outer')
    data = data.interpolate(method='linear')
    ax = data['S&P 500'].plot(legend=True)
    ax = data[['US-int','US-unempl']].plot(ax=ax, secondary_y=True, legend=True)
    print(data.corr())
    plt.show()
    

    Where the correlation is given here.

                S&P 500    US-int  US-unempl
    S&P 500    1.000000 -0.408429  -0.453315
    US-int    -0.408429  1.000000  -0.470103
    US-unempl -0.453315 -0.470103   1.000000
    

    Which is surprisingly low. Visually, you can see it here.

    The S&P 500 stock index, US interest rate and US unemployment rate

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment