Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Pandas: Determine Correlation Between GDP and Stock Market

    What will we cover in this tutorial?

    In this tutorial we will explore some aspects of the Pandas-Datareader, which is an invaluable way to get data from many sources, including the World Bank and Yahoo! Finance.

    In this tutorial we will investigate if the GDP of a country is correlated to the stock market.

    Step 1: Get GDP data from World Bank

    In the previous tutorial we looked at the GDP per capita and compared it between countries. GDP per capita is a good way to compare country’s economy between each other.

    In this tutorial we will look at the GDP and using the NY.GDP.MKTP.CD indicator of GDP in current US$.

    We can extract the data by using using the download function from the Pandas-datareader library.

    from pandas_datareader import wb
    
    gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)
    print(gdp)
    

    Resulting in the following output.

                        NY.GDP.MKTP.CD
    country       year                
    United States 2019  21427700000000
                  2018  20580223000000
                  2017  19485393853000
                  2016  18707188235000
                  2015  18219297584000
                  2014  17521746534000
                  2013  16784849190000
                  2012  16197007349000
                  2011  15542581104000
    

    Step 2: Gathering the stock index

    Then we need to gather the data from the stock market. As we look at the US stock market, the S&P 500 index is a good indicator of the market.

    The ticker of S&P 500 is ^GSPC (yes, with the ^).

    The Yahoo! Finance api is a great place to collect this type of data.

    import pandas_datareader as pdr
    import datetime as dt
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    print(sp500)
    

    Resulting in the following output.

    Date
    1990-01-02     359.690002
    1990-01-03     358.760010
    1990-01-04     355.670013
    1990-01-05     352.200012
    1990-01-08     353.790009
                     ...     
    2019-12-24    3223.379883
    2019-12-26    3239.909912
    2019-12-27    3240.020020
    2019-12-30    3221.290039
    2019-12-31    3230.780029
    

    Step 3: Visualizing the data on one plot

    A good way to see if there is a correlation is simply by visualizing it.

    This can be done with a few tweaks.

    import pandas_datareader as pdr
    import pandas as pd
    import matplotlib.pyplot as plt
    import datetime as dt
    from pandas_datareader import wb
    
    gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)
    gdp = gdp.unstack().T.reset_index(0)
    gdp.index = pd.to_datetime(gdp.index, format='%Y')
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    
    data = sp500.to_frame().join(gdp, how='outer')
    data = data.interpolate(method='linear')
    ax = data['Adj Close'].plot()
    ax = data['United States'].plot(ax=ax, secondary_y=True)
    plt.show()
    

    The GDP data needs to be formatted differently, by unstack’ing, transposing, and resetting the index. Then the index is converted from being strings of year to actually time series.

    We use a outer join to get all the dates in the time series. Then we interpolate with a linear method to fill out the gab in the graph.

    Finally, we make a plot af Adj Close of S&P 500 stock index and on of the GDP of United States, where we use the same graph, but using the secondary y-axis to plot. That means, the time series on the x-axis is the same.

    The resulting graph is.

    US GDP with S&P 500 index

    It could look like a correlation, which is visible in the aftermath of 2008.

    Step 4: Calculate a correlation

    Let’s try to make some correlation calculations.

    First, let’s not just rely on how US GDP correlates to the US stock market. Let us try to relate it to other countries GDP and see how they relate to the strongest economy in the world.

    import pandas_datareader as pdr
    import pandas as pd
    import matplotlib.pyplot as plt
    import datetime as dt
    from pandas_datareader import wb
    
    gdp = wb.download(indicator='NY.GDP.MKTP.CD', country=['NO', 'FR', 'US', 'GB', 'DK', 'DE', 'SE'], start=1990, end=2019)
    gdp = gdp.unstack().T.reset_index(0)
    gdp.index = pd.to_datetime(gdp.index, format='%Y')
    
    start = dt.datetime(1990, 1, 1)
    end = dt.datetime(2019, 12, 31)
    sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
    data = sp500.to_frame().join(gdp, how='outer')
    data = data.interpolate(method='linear')
    print(data.corr())
    

    Where we compare it the the GDP for some more countries to verify our hypothesis.

                    Adj Close   Denmark    France   Germany    Norway    Sweden  United Kingdom  United States
    Adj Close        1.000000  0.729701  0.674506  0.727289  0.653507  0.718829        0.759239       0.914303
    Denmark          0.729701  1.000000  0.996500  0.986769  0.975780  0.978550        0.955674       0.926139
    France           0.674506  0.996500  1.000000  0.982225  0.979767  0.974825        0.945877       0.893780
    Germany          0.727289  0.986769  0.982225  1.000000  0.953131  0.972542        0.913443       0.916239
    Norway           0.653507  0.975780  0.979767  0.953131  1.000000  0.978784        0.933795       0.878704
    Sweden           0.718829  0.978550  0.974825  0.972542  0.978784  1.000000        0.930621       0.916530
    United Kingdom   0.759239  0.955674  0.945877  0.913443  0.933795  0.930621        1.000000       0.915859
    United States    0.914303  0.926139  0.893780  0.916239  0.878704  0.916530        0.915859       1.000000
    

    Now that is interesting. The US Stock market (Adj Close) correlates the strongest with the US GDP. Not surprising.

    Of the chosen countries, the Danish GDP is the second most correlated to US stock market. The GDP of the countries correlate all strongly with the US GDP. There Norway correlates the least.

    Continue the exploration of World Bank data.

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment