Master Markowitz Portfolio Optimization (Efficient Frontier) in Python using Pandas

What is Markowitz Portfolios Optimization (Efficient Frontier)?

The Efficient Frontier takes a portfolio of investments and optimizes the expected return in regards to the risk. That is to find the optimal return for a risk.

According to investopedia.org the return is based on the expected Compound Annual Growth Rate (CAGR) and risk metric is the standard deviation of the return.

But what does all that mean? We will learn that in this tutorial.

Step 1: Get the time series of your stock portfolio

We will use the following portfolio of 4 stocks of Apple (AAPL), Microsoft (MSFT), IBM (IBM) and Nvidia (NVDA).

To get the time series we will use the Yahoo! Finance API through the Pandas-datareader.

We will look 5 years back.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from dateutil.relativedelta import relativedelta

years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL','MSFT','IBM','NVDA']
for ticker in tickers:
  tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
  close_price[ticker] = tmp['Close']

print(close_price)

Resulting in the following output (or the first few lines).

                  AAPL        MSFT         IBM        NVDA
Date                                                      
2015-08-25  103.739998   40.470001  140.960007   20.280001
2015-08-26  109.690002   42.709999  146.699997   21.809999
2015-08-27  112.919998   43.900002  148.539993   22.629999
2015-08-28  113.290001   43.930000  147.979996   22.730000
2015-08-31  112.760002   43.520000  147.889999   22.480000

It will contain all the date time series for the last 5 years from current date.

Step 2: Calculate the CAGR, returns, and covariance

To calculate the expected return, we use the Compound Average Growth Rate (CAGR) based on the last 5 years. The CAGR is used as investopedia suggest. An alternative that also is being used is the mean of the returns. The key thing is to have some common measure of the return.

The CAGR is calculated as follows.

CAGR = (end-price/start-price)^(1/years) – 1

We will also calculate the covariance as we will use that the calculate the variance of a weighted portfolio. Remember that the standard deviation is given by the following.

sigma = sqrt(variance)

A portfolio is a vector w with the balances of each stock. For example, given w = [0.2, 0.3, 0.4, 0.1], will say that we have 20% in the first stock, 30% in the second, 40% in the third, and 10% in the final stock. It all sums up to 100%.

Given a weight w of the portfolio, you can calculate the variance of the stocks by using the covariance matrix.

variance = w^T Cov w

Where Cov is the covariance matrix.

This results in the following pre-computations.

returns = close_price/close_price.shift(1)
cagr = (close_price.iloc[-1]/close_price.iloc[0])**(1/years) - 1
cov = returns.cov()

print(cagr)
print(cov)

Where you can see the output here.

# CACR:
AAPL    0.371509
MSFT    0.394859
IBM    -0.022686
NVDA    0.905011
dtype: float64

# Covariance
          AAPL      MSFT       IBM      NVDA
AAPL  0.000340  0.000227  0.000152  0.000297
MSFT  0.000227  0.000303  0.000164  0.000306
IBM   0.000152  0.000164  0.000260  0.000210
NVDA  0.000297  0.000306  0.000210  0.000879

Step 3: Plot the return and risk

This is where the power of computing comes into the picture. The idea is to just try a random portfolio and see how it rates with regards to expected return and risk.

It is that simple. Make a random weighted distribution of your portfolio and plot the point of expected return (based on our CAGR) and the risk based on the standard deviation calculated by the covariance.

import matplotlib.pyplot as plt
import numpy as np

def random_weights(n):
    k = np.random.rand(n)
    return k / sum(k)

exp_return = []
sigma = []
for _ in range(20000):
  w = random_weights(len(tickers))
  exp_return.append(np.dot(w, cagr.T))
  sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))

plt.plot(sigma, exp_return, 'ro', alpha=0.1) 
plt.show()

We introduce a helper function random_weights, which returns a weighted portfolio. That is, it returns a vector with entries that sum up to one. This will give a way to distribute our portfolio of stocks.

Then we iterate 20.000 times (could be any value, just want to have enough to plot our graph), where we make a random weight w, then calculate the expected return by the dot-product of w and cagr-transposed. This is done by using NumPy’s dot-product function.

What a dot-product of np.dot(w, cagr.T) does is to take elements pairwise from w and cagr and multiply them and sum up. The transpose is only about the orientation of it to make it work.

The standard deviation (assigned to sigma) is calculated similar by the formula given in the last step: variance = w^T Cov w (which has dot-products between).

This results in the following graph.

Returns vs risks

This shows a graph which outlines a parabola. The optimal values lie along the upper half of the parabola line. Hence, given a risk, the optimal portfolio is one corresponding on the upper boarder of the filled parabola.

Considerations

The Efficient Frontier gives you a way to balance your portfolio. The above code can by trial an error find such a portfolio, but it still leaves out some consideratoins.

How often should you re-balance? It has a cost to do that.

The theory behind has some assumptions that may not be a reality. As investopedia points out, it assumes that asset returns follow a normal distribution, but in reality returns can be more the 3 standard deviations away. Also, the theory builds upon that investors are rational in their investment, which is by most considered a flawed assumption, as more factors play into the investments.

The full source code

Below here you find the full source code from the tutorial.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import numpy as np


years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL', 'MSFT', 'IBM', 'NVDA']
for ticker in tickers:
    tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
    close_price[ticker] = tmp['Close']

returns = close_price / close_price.shift(1)
cagr = (close_price.iloc[-1] / close_price.iloc[0]) ** (1 / years) - 1
cov = returns.cov()

def random_weights(n):
    k = np.random.rand(n)
    return k / sum(k)

exp_return = []
sigma = []
for _ in range(20000):
    w = random_weights(len(tickers))
    exp_return.append(np.dot(w, cagr.T))
    sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))

plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()

Multiple Time Frame Analysis on a Stock using Pandas

What will we investigate in this tutorial?

A key element to success in trading is to understand the market and the trend of the stock before you buy it. In this tutorial we will not cover how to read the market, but take a top-down analysis approach to stock prices. We will use what is called Multiple Time Frame Analysis on a stock starting with a 1-month, 1-week, and 1-day perspective. Finally, we will compare that with a Simple Moving Average with a monthly view.

Step 1: Gather the data with different time frames

We will use the Pandas-datareader library to collect the time series of a stock. The library has an endpoint to read data from Yahoo! Finance, which we will use as it does not require registration and can deliver the data we need.

import pandas_datareader as pdr
import datetime as dt


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')

Where the key is to set the interval to ‘d’ (Day), ‘wk’ (Week), and ‘mo’ (Month).

This will give us 3 DataFrames, each indexed with different intervals.

Dayly.

                  High         Low  ...      Volume   Adj Close
Date                                ...                        
2019-01-02  101.750000   98.940002  ...  35329300.0   98.860214
2019-01-03  100.190002   97.199997  ...  42579100.0   95.223351
2019-01-04  102.510002   98.930000  ...  44060600.0   99.652115
2019-01-07  103.269997  100.980003  ...  35656100.0   99.779205
2019-01-08  103.970001  101.709999  ...  31514400.0  100.502670

Weekly.

                  High         Low  ...       Volume   Adj Close
Date                                ...                         
2019-01-01  103.269997   97.199997  ...  157625100.0   99.779205
2019-01-08  104.879997  101.260002  ...  150614100.0   99.769432
2019-01-15  107.900002  101.879997  ...  127262100.0  105.302940
2019-01-22  107.879997  104.660004  ...  142112700.0  102.731720
2019-01-29  106.379997  102.169998  ...  203449600.0  103.376968

Monthly.

                  High         Low  ...        Volume   Adj Close
Date                                ...                          
2019-01-01  107.900002   97.199997  ...  7.142128e+08  102.096245
2019-02-01  113.239998  102.349998  ...  4.690959e+08  109.526405
2019-03-01  120.820000  108.800003  ...  5.890958e+08  115.796768
2019-04-01  131.369995  118.099998  ...  4.331577e+08  128.226700
2019-05-01  130.649994  123.040001  ...  5.472188e+08  121.432449
2019-06-01  138.399994  119.010002  ...  5.083165e+08  132.012497

Step 2: Combine data and interpolate missing points

The challenge to connect the DataFrames is that they have different index entries. If we add the data points from Daily with Weekly, there will be a lot of missing entries that Daily has, but Weekly does not have.

                   day        week
Date                              
2019-01-02  101.120003         NaN
2019-01-03   97.400002         NaN
2019-01-04  101.930000         NaN
2019-01-07  102.059998         NaN
2019-01-08  102.800003  102.050003
...                ...         ...
2020-08-13  208.699997         NaN
2020-08-14  208.899994         NaN
2020-08-17  210.279999         NaN
2020-08-18  211.490005  209.699997
2020-08-19  209.699997  209.699997

To deal with that we can choose to interpolate by using the DataFrame interpolate function.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')

data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='linear')
print(data)

Which results in the following output.

                   day        week
Date                              
2019-01-02  101.120003         NaN
2019-01-03   97.400002         NaN
2019-01-04  101.930000         NaN
2019-01-07  102.059998         NaN
2019-01-08  102.800003  102.050003
...                ...         ...
2020-08-13  208.699997  210.047998
2020-08-14  208.899994  209.931998
2020-08-17  210.279999  209.815997
2020-08-18  211.490005  209.699997
2020-08-19  209.699997  209.699997

Where the missing points (except the first entry) will be linearly put between. This can be done for months as well, but we need to be more careful because of three things. First, some dates (1st of the month) do not exist in the data DataFrame. To solve that we use an outer-join, which will include them. Second, this introduces some extra dates, which are not trading dates. Hence, we need to delete them afterwards, which we can do by deleting the column (drop) and removing rows with NA value (dropna). Thirdly, we also need to understand that the monthly view looks backwards. Hence, the 1st of January is first finalized the last day of January. Therefore we shift it back in the join.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()

Step 3: Visualize the output and take a look at it

To visualize it is straight forward by using matplotlib.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()

data.plot()
plt.show()

Which results in the following graph.

As expected the monthly price is adjusted to be the closing day-price the day before. Hence, it looks like the monthly-curve is crossing the day-curve on the 1st every month (which is almost true).

To really appreciate the Multiple Time Frames Analysis, it is better to keep the graphs separate and interpret them each isolated.

Step 4: How to use these different Multiple Time Frame Analysis

Given the picture it is a good idea to start top down. First look at the monthly picture, which shows the overall trend.

Month view of MFST.

In the case of MSFT it is a clear growing trend, with the exception of two declines. But the overall impression is a company in growth that does not seem to slow down. Even the Dow theory (see this tutorial on it) suggest that there will be secondary movements in a general bull trend.

Secondly, we will look at the weekly view.

Weekly view of MFST

Here your impression is a bit more volatile. It shows many smaller ups and downs, with a big one in March, 2020. It could also indicate a small decline in the growth right and the end. Also, the Dow theory could suggest that it will turn. Though it is not certain.

Finally, on the daily view it gives a more volatile picture, which can be used to when to enter the market.

Day view of MFST

Here you could also be a bit worried. Is this the start of a smaller bull market.

To sum up. In the month-view, we have concluded a growth. The week-view shows signs of possible change. Finally, the day-view is also showing signs of possible decline.

As an investor, and based on the above, I would not enter the market right now. If both the month-view and week-view showed growth, while the day-view decline, that would be a good indicator. You want the top level to show growth, while a day-view might show a small decline.

Finally, remember that you should not just use one way to interpret to enter the market or not.

Step 5: Is monthly the same as a Simple Moving Average?

Good question, I am glad you asked. The Simple Moving Average (SMA) can be calculated easy with DataFrames using rolling and mean function.

Best way is to just try it.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()

data.plot()
plt.show()

As you see, the SMA is not as reactive on the in crisis in March, 2020, as the monthly view is. This shows a difference in them. This does not exclude the one from the other, but shows a difference in how they react.

Comparing the month-view with a Simple Moving Average of a month (20 trade days)

Please remember, that the monthly view is first updated at the end of a month, while SMA is updated on a daily basis.

Other differences is that SMA is an average of the 20 last days, while the monthly is the actual value of the last day of a month (as we look at Close). This implies that the monthly view can be much more volatile than the SMA.

Conclusion

It is advised to make analysis from bigger time frames and zoom in. This way you first look at overall trends, and get a bigger picture of the market. This should eliminate not to fall into being focused on a small detail in the market, but understand it on a higher level.

Master Dow Theory with Python Pandas

What will we cover in this tutorial?

Dow theory was proposed by Charles H. Dow and is not an exact science. It is more how to identify trends in the market. In this tutorial we investigate the approach by testing it on data. Notice, that there are various ways to interpret it and often it is done by visual approximations, while we in this tutorial will make some rough assumptions to see if it beats the buy-and-hold approach of a stock.

First we will make our assumption on how to implement the Dow theory approach to make buy and sell indicators, which we will use as buy and sell markers in the market.

Step 1: Understand the Dow theory to make buy and sell indicators

The essence of Dow theory is that there are 3 types of trend in the market. The primary trend is a year or more long trend, like a bull market. Then on a secondary trend, the market can move in opposite direction for 3 weeks to 3 months. This can result in a pullback, that can seem like a bear market within the bull market. Finally, there are micro trends (less than 3 weeks) which can be considered as noise.

According to Dow theory each market has 3 phases. Our objective as an investor is to identify when a bear market turns into bull market.

Some visual example to understand the above will help a bit. A general bull market with primary and secondary trends could look like this.

Primary bull market trend with secondary bear market trends.

Where you should notice that the temporary lows are all increasing along the way.

A similar picture for a bear market could be.

Primary bear market trend with secondary bull market trends.

Here you should notice how the secondary bull markets peaks are also in a decreasing trend.

Step 2: Identify when a primary market trend changes

The key here is to identify when a primary stock trend goes from bull to bear or opposite.

Please also notice that Dow theory talks about the market and we here are looking at a stock. Hence, we have an assumption that the market and the stock have a strong enough correlation to use the same theory.

From a primary bear to a primary bull market could look like as follows.

From bear to bull market

We have added some markers in the diagram.

  • LL : Low-Low – meaning that the low is lower than previous low.
  • LH : Low-High – meaning that the high is lower than previous high.
  • HH : High-High – meaning that the high is higher than previous high.
  • HL : High-Low – meaning that the low is higher than previous low.

As you see, the bear market consists of consecutive LL and LH, while a bull market consists of consecutive HH and LH. The market changes from bear to bull when we confidently can say that we will get a HH, which we can do when we cross from the last LL over the last LH (before we reach HH).

Hence, a buy signal can be set when we reach a stock price above last LH.

Similar we can investigate the when a primary trends goes from bull to hear market.

From bull to a bear trend.

Where we have the same types of markers.

We see that the trend changes from bull to bear when we go from HL to LL. Hence, a sell indicator is when we are sure we reach a LL (that is before it is a LL).

Again, this is not an exact science and is just a way to interpret it. We will try it out on real stock data to see how it performs.

Step 3: Get some data and calculate points of lows and highs

We will use Pandas-datareader to get the time series data from Yahoo! Finance.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())

print(ticker)

Resulting in a time series for Twitter, which has the ticker TWTR. You can find other tickers for other companies by using the Yahoo! Finance ticker lookup.

                 High        Low       Open      Close    Volume  Adj Close
Date                                                                       
2020-01-02  32.500000  31.959999  32.310001  32.299999  10721100  32.299999
2020-01-03  32.099998  31.260000  31.709999  31.520000  14429500  31.520000
2020-01-06  31.709999  31.160000  31.230000  31.639999  12582500  31.639999
2020-01-07  32.700001  31.719999  31.799999  32.540001  13712900  32.540001
2020-01-08  33.400002  32.349998  32.349998  33.049999  14632400  33.049999
...               ...        ...        ...        ...       ...        ...
2020-08-12  38.000000  36.820000  37.500000  37.439999  11013300  37.439999
2020-08-13  38.270000  37.369999  37.430000  37.820000  13259400  37.820000
2020-08-14  37.959999  37.279999  37.740002  37.900002  10377300  37.900002
2020-08-17  38.090000  37.270000  37.950001  37.970001  10188500  37.970001
2020-08-18  38.459999  37.740002  38.279999  38.009998   8548300  38.009998

First thing we need to get is to find the low and highs. First challenge here is that the stock price is going up and down during the day. To simplify our investigation we will only use the Close price.

Taking that decision might limit and not give correct results, but it surely simplifies our work.

Next up, we need to identify highs and lows. This can be done to see when a daily difference goes from positive to negative.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())

ticker['delta'] = ticker['Close'].diff()
growth = ticker['delta'] > 0
ticker['markers'] = growth.diff().shift(-1)

print(ticker)

Please notice the shit(-1) as it moves the indicator on the day of the change.

2020-08-05  37.340000  36.410000  36.560001  36.790001   10052100  36.790001  0.440002   False
2020-08-06  37.810001  36.490002  36.849998  37.689999   10478900  37.689999  0.899998    True
2020-08-07  38.029999  36.730000  37.419998  37.139999   11335100  37.139999 -0.549999    True
2020-08-10  39.169998  37.310001  38.360001  37.439999   29298400  37.439999  0.299999    True
2020-08-11  39.000000  36.709999  37.590000  37.279999   20486000  37.279999 -0.160000    True
2020-08-12  38.000000  36.820000  37.500000  37.439999   11013300  37.439999  0.160000   False
2020-08-13  38.270000  37.369999  37.430000  37.820000   13259400  37.820000  0.380001   False
2020-08-14  37.959999  37.279999  37.740002  37.900002   10377300  37.900002  0.080002   False
2020-08-17  38.090000  37.270000  37.950001  37.970001   10188500  37.970001  0.070000   False
2020-08-18  38.459999  37.740002  38.279999  38.009998    8548300  38.009998  0.039997     NaN

Where we have output above. The True values are when we reach Highs or Lows.

Now we have identified all the potential HH, LH, LH, and LL.

Step 4: Implement a simple trial of sell and buy

We continue our example on Twitter and see how we can perform.

Our strategy will be as follows.

  • We either have bought stocks for all our money or not. That is, either we have stocks or not.
  • If we do not have stocks, we buy if stock price is above last high, meaning that a HH is coming.
  • If we do have stocks, we sell if stock price is below last low, meaning that a LL is coming.

This can mean that we enter market in the last of a bull market. If you were to follow the theory complete, it suggest to wait until a bear market changes to a bull market.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())

ticker['delta'] = ticker['Close'].diff()
growth = ticker['delta'] > 0
ticker['markers'] = growth.diff().shift(-1)

# We want to remember the last_high and last_low
# Set to max value not to trigger false buy
last_high = ticker['Close'].max()
last_low = 0.0
# Then setup our account, we can only have stocks or not
# We have a start balance of 100000 $
has_stock = False
balance = 100000
stocks = 0
for index, row in ticker.iterrows():
  # Buy and sell orders
  if not has_stock and row['Close'] > last_high:
    has_stock = True
    stocks = balance//row['Close']
    balance -= row['Close']*stocks
  elif has_stock and row['Close'] < last_low:
    has_stock = False
    balance += row['Close']*stocks
    stocks = 0

  # Update the last_high and last_low
  if row['markers']:
    if row['delta'] > 0:
      last_high = row['Close']
    else:
      last_low = row['Close']


print("Dow returns", balance + stocks*ticker['Close'].iloc[-1])

# Compare this with a simple buy and hold approach.
buy_hold_stocks = 100000//ticker['Close'].iloc[0]
buy_hold = 100000 - buy_hold_stocks*ticker['Close'].iloc[0] + buy_hold_stocks*ticker['Close'].iloc[-1]
print("Buy-and-hold return", buy_hold)

Which results in the following results.

Dow returns 120302.0469455719
Buy-and-hold return 117672.44716644287

That looks promising, but it might be just out of luck. Hence, we want to validate with other examples. The results say a return of investment of 20.3% using our Dow theory approach, while a simple buy-and-hold strategy gave 17.7%. This is over the span of less than 8 months.

The thing you would like to achieve with a strategy is to avoid big losses and not loose out on revenue. The above testing does not justify any clarification on that.

Step 5: Try out some other tickers to test it

A first investigation is to check how the algorithm performs on other stocks. We make one small adjustment, as the comparison to buy on day-1, might be quite unfair. If price is low, it an advantage, while if the price is high, it is a big disadvantage. The code below runs on multiple stocks and compare the first buy with a Dow approach (as outlined in this tutorial) with a buy-and-hold approach. The exit of the market might also be unfair.

import pandas_datareader as pdr
import datetime as dt

def dow_vs_hold_and_buy(ticker_name):
  ticker = pdr.get_data_yahoo(ticker_name, dt.datetime(2020,1,1), dt.datetime.now())

  ticker['delta'] = ticker['Close'].diff()
  growth = ticker['delta'] > 0
  ticker['markers'] = growth.diff().shift(-1)

  # We want to remember the last_high and last_low
  # Set to max value not to trigger false buy
  last_high = ticker['Close'].max()
  last_low = 0.0
  # Then setup our account, we can only have stocks or not
  # We have a start balance of 100000 $
  has_stock = False
  balance = 100000
  stocks = 0
  first_buy = None
  for index, row in ticker.iterrows():
    # Buy and sell orders
    if not has_stock and row['Close'] > last_high:
      has_stock = True
      stocks = balance//row['Close']
      balance -= row['Close']*stocks
      if first_buy is None:
        first_buy = index
    elif has_stock and row['Close'] < last_low:
      has_stock = False
      balance += row['Close']*stocks
      stocks = 0

    # Update the last_high and last_low
    if row['markers']:
      if row['delta'] > 0:
        last_high = row['Close']
      else:
        last_low = row['Close']

  dow_returns = balance + stocks*ticker['Close'].iloc[-1]

  # Compare this wiith a simple buy and hold approach.
  buy_hold_stocks = 100000//ticker['Close'].loc[first_buy]
  buy_hold_returns = 100000 - buy_hold_stocks*ticker['Close'].loc[first_buy] + buy_hold_stocks*ticker['Close'].iloc[-1]

  print(ticker_name, dow_returns > buy_hold_returns, round(dow_returns/1000 - 100, 1), round(buy_hold_returns/1000 - 100, 1))


tickers = ["TWTR", "AAPL", "TSLA", "BAC", "KO", "GM", "MSFT", "AMZN", "GOOG", "FB", "INTC", "T"]
for ticker in tickers:
  dow_vs_hold_and_buy(ticker)

Resulting in the following output.

TWTR   True  20.3  14.4
AAPL  False  26.4  52.3
TSLA   True 317.6 258.8
BAC    True -16.3 -27.2
KO     True  -8.2 -14.6
GM     True   8.9 -15.1
MSFT  False  26.2  32.1
AMZN  False  32.8  73.9
GOOG  False   7.1  11.0
FB     True  18.3  18.2
INTC  False -34.9 -18.4
T     False -25.3 -20.8

This paints a different picture. First, it seems more random if it outperforms the buy-and-hold approach.

The one performing best is the General Motors Company (GM), but it might be due to unlucky entering of the market. The stock was high in the beginning of the year, and then fell a lot. Hence, here the Dow helped to exit and enter the market correct.

Intel Corporation (INTC) is working a lot against us. While there is a big loss (-18.4%), it is not saved by our Dow theory algorithm. There was a big loss in stock value 24th of July with 20% from close the day before to open. The Dow cannot save you for situations like that and will sell on the far bottom.

The Apple (AAPL) is also missing a lot of gain. The stock is in a great growth in 2020, with some challenges in March and after (Corona hit). But looking and buy and sell signals, it hits sell higher than the following buy and losses out on gain.

Amazon (AMZN) seems to be the same story. Growth in general and hitting buying on higher than previous sell, and loosing out on profit.

Next steps and considerations

We have made some broad simplifications in our algorithm.

  • Only consider Close value, while a normal way to find the markers are on a OHLC candlestick diagram.
  • If we used the span of the day price, then we might limit our losses with a stop-loss order earlier.
  • This is not an exact science, and the trends might need a different way to identify them.

Hence, the above suggest it can be more adjusted to real life.

Another thing to keep in mind is that you should never make your investment decision on only one indicator or algorithm choice.

Pandas: Calculate the Relative Strength Index (RSI) on a Stock

What is the Relative Strength Index?

The Relative Strength Index (RSI) on a stock is a technical indicator.

The relative strength index (RSI) is a momentum indicator used in technical analysis that measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock or other asset. 

https://www.investopedia.com/terms/r/rsi.asp

A technical indicator is a mathematical calculation based on past prices and volumes of a stock. The RSI has a value between 0 and 100. It is said to be overbought if above 70, and oversold if below 30.

Step 1: How to calculate the RSI

To be quite honest, I found the description on investopedia.org a bit confusing. Therefore I went for the Wikipedia description of it. It is done is a couple of steps, so let us do the same.

  1. If previous price is lower than current price, then set the values.
    • U = close_now – close_previous
    • D = 0
  2. While if the previous price is higher than current price, then set the values
    • U = 0
    • D = close_previous – close_now
  3. Calculate the Smoothed or modified moving average (SMMA) or the exponential moving average (EMA) of D and U. To be aligned with the Yahoo! Finance, I have chosen to use the (EMA).
  4. Calculate the relative strength (RS)
    • RS = EMA(U)/EMA(D)
  5. Then we end with the final calculation of the Relative Strength Index (RSI).
    • RSI = 100 – (100 / (1 – RSI))

Notice that the U are the price difference if positive otherwise 0, while D is the absolute value of the the price difference if negative.

Step 2: Get a stock and calculate the RSI

We will use the Pandas-datareader to get some time series data of a stock. If you are new to using Pandas-datareader we advice you to read this tutorial.

In this tutorial we will use Twitter as an examples, which has the TWTR ticker. It you want to do it on some other stock, then you can look up the ticker on Yahoo! Finance here.

Then below we have the following calculations.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())

delta = ticker['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down

print(ticker)

To have a naming that is close to the definition and also aligned with Python, we use up for U and down for D.

This results in the following output.

                 High        Low       Open  ...    Volume  Adj Close        RSI
Date                                         ...                                
2020-01-02  32.500000  31.959999  32.310001  ...  10721100  32.299999        NaN
2020-01-03  32.099998  31.260000  31.709999  ...  14429500  31.520000   0.000000
2020-01-06  31.709999  31.160000  31.230000  ...  12582500  31.639999   1.169582
2020-01-07  32.700001  31.719999  31.799999  ...  13712900  32.540001   9.699977
2020-01-08  33.400002  32.349998  32.349998  ...  14632400  33.049999  14.218360
...               ...        ...        ...  ...       ...        ...        ...
2020-08-11  39.000000  36.709999  37.590000  ...  20486000  37.279999  58.645030
2020-08-12  38.000000  36.820000  37.500000  ...  11013300  37.439999  59.532873
2020-08-13  38.270000  37.369999  37.430000  ...  13259400  37.820000  61.639293
2020-08-14  37.959999  37.279999  37.740002  ...  10377300  37.900002  62.086731
2020-08-17  38.090000  37.270000  37.950001  ...  10186900  37.970001  62.498897

This tutorial was written 2020-08-18, and comparing with the RSI for twitter on Yahoo! Finance.

From Yahoo! Finance on Twitter with RSI

As you can see in the lower left corner, the RSI for the same ending day was 62.50, which fits the calculated value. Further checks reveal that they also fit the values of Yahoo.

Step 3: Visualize the RSI with the daily stock price

We will use the matplotlib library to visualize the RSI with the stock price. In this tutorial we will have two rows of graphs by using the subplots function. The function returns an array of axis (along with a figure, which we will not use).

The axis can be parsed to the Pandas DataFrame plot function.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt


ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2019,1,1), dt.datetime.now())

delta = ticker['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down

ticker['RSI'] = 100 - (100/(1 + rs))

# Skip first 14 days to have real values
ticker = ticker.iloc[14:]

print(ticker)
fig, (ax1, ax2) = plt.subplots(2)
ax1.get_xaxis().set_visible(False)
fig.suptitle('Twitter')

ticker['Close'].plot(ax=ax1)
ax1.set_ylabel('Price ($)')
ticker['RSI'].plot(ax=ax2)
ax2.set_ylim(0,100)
ax2.axhline(30, color='r', linestyle='--')
ax2.axhline(70, color='r', linestyle='--')
ax2.set_ylabel('RSI')

plt.show()

Also, we we remove the x-axis of the first graph (ax1). Adjust the y-axis of the second graph (ax2). Also, we have set two horizontal lines to indicate overbought and oversold at 70 and 30, respectively. Notice, that Yahoo! Finance use 80 and 20 as indicators by default.

The resulting output.

Pandas: Calculate a Heatmap to Visualize Historical CAGR Sector Performance

What is CACR and why not use AAGR?

Often when you see financial advisors have statements with awesome returns. These returns might be what is called Annual Average Growth Rates (AAGR). Why should you be skeptical with AAGR?

Simple example will show you.

  • You start by investing 10.000$.
  • First year you get 100% in return, resulting in 20.000$.
  • The year after you have a fall of 50%, which makes your value back to 10.000$

Using AAGR, your investor will tell you you have (100% – 50%)/2 = 25% AAGR or calls it average annual return.

But wait a minute? You have the same amount of money after two years, so how can that be 25%?

With Compound Annual Growth Rate the story is different as it only considers the start and end value. Here the difference is a big 0$, resulting in a 0% CAGR.

The formula for calculating CAGR is.

((end value)/(start value))^(1/years) – 1

As the above example: (10.000/10.000)^(1/2) – 1 = 0

Step 1: Getting access to financial sector data

In this tutorial we will use the Alpha Vantage. To connect to them you need to register to get a API_KEY.

To claim your key go to: https://www.alphavantage.co/support/#api-key

Where you will select Software Developer in the drop-down Which of the following best describes you? Write your organization of choice. Then write your email address and click that you are not a robot. Or are you?

Then it will give you hare API_KEY on the screen (not in a email). The key is probably a 16 upper case character and integer string.

Step 2: Get the sector data to play with

Looking at Pandas-datareaders API you will see you can use the get_sector_performance_av() function.

import pandas_datareader.data as web

API_KEY = "INSERT YOUR KEY HERE"

data = web.get_sector_performance_av(api_key=API_KEY)
print(data)

Remember to change API_KEY to the key you got from Step 1.

You should get an output similar to this one (not showing all columns).

                            RT      1D      5D  ...       3Y       5Y      1
0Y
Communication Services   0.38%   0.38%  -0.20%  ...   24.04%   29.92%   74.7
8%
Information Technology   0.04%   0.04%  -1.36%  ...  104.45%  183.51%  487.3
3%
Consumer Discretionary  -0.06%  -0.06%   1.36%  ...   66.06%   92.37%  384.7
1%
Materials               -0.07%  -0.07%   1.75%  ...   17.50%   37.64%  106.9
0%
Health Care             -0.16%  -0.17%   0.90%  ...   37.21%   43.20%  268.5
8%
Consumer Staples        -0.19%  -0.19%   1.42%  ...   15.96%   27.65%  137.66%
Utilities               -0.38%  -0.38%   0.60%  ...   13.39%   34.79%   99.63%
Financials              -0.61%  -0.61%   3.23%  ...    1.67%   23.89%  119.46%
Industrials             -0.65%  -0.65%   4.45%  ...   12.57%   40.05%  155.56%
Real Estate             -1.23%  -1.23%  -0.63%  ...   12.51%      NaN      NaN
Energy                  -1.99%  -1.99%   1.38%  ...  -39.45%  -44.69%  -29.07%

The columns we are interested in are the 1Y, 3Y, 5Y, and 10Y.

Step 3: Convert columns to floats

As you saw in the previous Step that the columns all contain in %-sign, which tells you that the entries are strings and not floats and need to be converted.

This can be done by some string magic. First we need to remove the %-sign before we convert it to a float.

import pandas_datareader.data as web

API_KEY = "INSERT YOUR KEY HERE"

data = web.get_sector_performance_av(api_key=API_KEY)

for column in data.columns:
  data[column] = data[column].str.rstrip('%').astype('float') / 100.0

print(data[['1Y', '3Y', '5Y' , '10Y']])

Where we convert all columns in the for-loop. Then we print only the columns we need.

                            1Y      3Y      5Y     10Y
Communication Services  0.1999  0.2404  0.2992  0.7478
Information Technology  0.4757  1.0445  1.8351  4.8733
Consumer Discretionary  0.2904  0.6606  0.9237  3.8471
Materials               0.1051  0.1750  0.3764  1.0690
Health Care             0.1908  0.3721  0.4320  2.6858
Consumer Staples        0.0858  0.1596  0.2765  1.3766
Utilities               0.0034  0.1339  0.3479  0.9963
Financials             -0.0566  0.0167  0.2389  1.1946
Industrials             0.0413  0.1257  0.4005  1.5556
Real Estate            -0.0658  0.1251     NaN     NaN
Energy                 -0.3383 -0.3945 -0.4469 -0.2907

All looking nice. Also, notice that we converted them to float values and not in %-values by dividing by 100.

Step 4: Calculate the CAGR

Now we need to use the formula on the columns.

import pandas_datareader.data as web

API_KEY = "INSERT YOUR KEY HERE"

data = web.get_sector_performance_av(api_key=API_KEY)

for column in data.columns:
  data[column] = data[column].str.rstrip('%').astype('float') / 100.0

data['1Y-CAGR'] = data['1Y']*100
data['3Y-CAGR'] = ((1 + data['3Y']) ** (1/3) - 1) * 100
data['5Y-CAGR'] = ((1 + data['5Y']) ** (1/5) - 1) * 100
data['10Y-CAGR'] = ((1 + data['10Y']) ** (1/10) - 1) * 100

cols = ['1Y-CAGR','3Y-CAGR', '5Y-CAGR', '10Y-CAGR']

print(data[cols])

This should result in something similar.

                        1Y-CAGR    3Y-CAGR    5Y-CAGR   10Y-CAGR
Communication Services    19.99   7.445258   5.374421   5.742403
Information Technology    47.57  26.919700  23.172477  19.368083
Consumer Discretionary    29.04  18.419079  13.979689  17.097655
Materials                 10.51   5.522715   6.597970   7.541490
Health Care               19.08  11.120773   7.445592  13.933956
Consumer Staples           8.58   5.059679   5.003594   9.042452
Utilities                  0.34   4.277734   6.152820   7.157502
Financials                -5.66   0.553596   4.377587   8.177151
Industrials                4.13   4.025758   6.968677   9.837158
Real Estate               -6.58   4.007273        NaN        NaN
Energy                   -33.83 -15.399801 -11.169781  -3.376449

Looks like the Information Technology sector is very lucrative.

But to make it more digestible we should visualize it.

Step 5: Create a heatmap

We will use the seaborn library to create it, which is a statistical data visualizing library.

The heatmap endpoint is defined to simply take the DataFrame to visualize. It could not be easier.

import pandas_datareader.data as web
import seaborn as sns
import matplotlib.pyplot as plt

API_KEY = "INSERT YOUR KEY HERE"

data = web.get_sector_performance_av(api_key=API_KEY)

for column in data.columns:
  data[column] = data[column].str.rstrip('%').astype('float') / 100.0

data['1Y-CAGR'] = data['1Y']*100
data['3Y-CAGR'] = ((1 + data['3Y']) ** (1/3) - 1) * 100
data['5Y-CAGR'] = ((1 + data['5Y']) ** (1/5) - 1) * 100
data['10Y-CAGR'] = ((1 + data['10Y']) ** (1/10) - 1) * 100

cols = ['1Y-CAGR','3Y-CAGR', '5Y-CAGR', '10Y-CAGR']

sns.heatmap(data[cols], annot=True, cmap="YlGnBu")
plt.show()

Resulting in the following output.

Pandas: Calculate and plot the Bollinger Bands for a Stock

What is the Bollinger Bands?

A Bollinger Band® is a technical analysis tool defined by a set of trendlines plotted two standard deviations (positively and negatively) away from a simple moving average (SMA) of a security’s price, but which can be adjusted to user preferences.

https://www.investopedia.com/terms/b/bollingerbands.asp

The Bollinger Bands are used to discover if a stock is oversold or overbought. It is called a mean reversion indicator, which measures how far a price swing will stretch before a counter impulse triggers a retracement.

It is a lagging indicator, which is looking at historical background of the current price. Opposed to a leading indicator, which tries to where the price is heading.

Step 1: Get some time series data on a stock

In this tutorial we will use the Apple stock as example, which has ticker AAPL. You can change to any other stock of your interest by changing the ticker below. To find the ticker of your favorite company/stock you can use Yahoo! Finance ticker lookup.

To get some time series of stock data we will use the Pandas-datareader library to collect it from Yahoo! Finance.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("AAPL", dt.datetime(2020, 1, 1), dt.datetime.now())[['Close', 'High', 'Low']]
print(ticker)

We will use the Close, High and Low columns to do the further calculations.

                 Close        High         Low
Date                                          
2020-01-02  300.350006  300.600006  295.190002
2020-01-03  297.429993  300.579987  296.500000
2020-01-06  299.799988  299.959991  292.750000
2020-01-07  298.390015  300.899994  297.480011
2020-01-08  303.190002  304.440002  297.160004
...                ...         ...         ...
2020-08-06  455.609985  457.649994  439.190002
2020-08-07  444.450012  454.700012  441.170013
2020-08-10  450.910004  455.100006  440.000000
2020-08-11  437.500000  449.929993  436.429993
2020-08-12  452.040009  453.100006  441.190002

Step 2: How are the Bollinger Bands calculated

Luckily, we can refer to Investopedia.org to get the answer, which states that the Bollinger Bands are calculated as follows.

BOLU=MA(TP,n)+mσ[TP,n]

BOLD=MA(TP,n)−mσ[TP,n]

Where BOLU is the Upper Bollinger Band and BOLD is Lower Bollinger Band. The MA is the Moving Average. The TP and σ are calculated as follows.

TP (typical price)=(High+Low+Close)÷3

σ[TP,n] = Standard Deviation over last n periods of TP​

Where n is the number of days in smoothing period (typically 20), and m is the number of standard deviations (typically 2).

Step 3: Calculate the Bollinger Bands

This is straight forward. We start by calculating the typical price TP and then the standard deviation over the last 20 days (the typical value). Then we calculate the simple moving average of rolling over the last 20 days (the typical value). Then we have the values to calculate the upper and lower values of the Bolling Bands (BOLU and BOLD).

ticker['TP'] = (ticker['Close'] + ticker['Low'] + ticker['High'])/3
ticker['std'] = ticker['TP'].rolling(20).std(ddof=0)
ticker['MA-TP'] = ticker['TP'].rolling(20).mean()
ticker['BOLU'] = ticker['MA-TP'] + 2*ticker['std']
ticker['BOLD'] = ticker['MA-TP'] - 2*ticker['std']
print(ticker)

Resulting in the following output.

Date                                          
                 Close        High  ...        BOLU        BOLD
Date                                ...                        
2020-01-02  300.350006  300.600006  ...         NaN         NaN
2020-01-03  297.429993  300.579987  ...         NaN         NaN
2020-01-06  299.799988  299.959991  ...         NaN         NaN
2020-01-07  298.390015  300.899994  ...         NaN         NaN
2020-01-08  303.190002  304.440002  ...         NaN         NaN
...                ...         ...  ...         ...         ...
2020-08-06  455.609985  457.649994  ...  445.784036  346.919631
2020-08-07  444.450012  454.700012  ...  453.154374  346.012626
2020-08-10  450.910004  455.100006  ...  459.958160  345.317173
2020-08-11  437.500000  449.929993  ...  464.516981  346.461685
2020-08-12  452.040009  453.100006  ...  469.891271  346.836730

Note, that if you compare you results with Yahoo! Finance for Apple, there will be some small difference. The reason is, that they by default use TP to be closing price and not the average of the Close, Low and High. If you change TP to equal Close only, you will get the same figures as they do.

Step 4: Plotting it on a graph

Plotting the three lines is straight forward by using plot() on the DataFrame. Making an filled area with color between BOLU and BOLD can be achieved by using fill_between().

This results in the full program to be.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt


ticker = pdr.get_data_yahoo("AAPL", dt.datetime(2020, 1, 1), dt.datetime.now())[['Close', 'High', 'Low']]

# Boillinger band calculations
ticker['TP'] = (ticker['Close'] + ticker['Low'] + ticker['High'])/3
ticker['std'] = ticker['TP'].rolling(20).std(ddof=0)
ticker['MA-TP'] = ticker['TP'].rolling(20).mean()
ticker['BOLU'] = ticker['MA-TP'] + 2*ticker['std']
ticker['BOLD'] = ticker['MA-TP'] - 2*ticker['std']
ticker = ticker.dropna()
print(ticker)

# Plotting it all together
ax = ticker[['Close', 'BOLU', 'BOLD']].plot(color=['blue', 'orange', 'yellow'])
ax.fill_between(ticker.index, ticker['BOLD'], ticker['BOLU'], facecolor='orange', alpha=0.1)
plt.show()

Giving the following graph.

Apple Stock Closing price with Bollinger Band indicators

Step 5: How to use the Bollinger Band Indicator?

If the stock price are continuously touching the upper Bollinger Band (BOLU) the market is thought to be overbought. While if the price continuously touches the lower Bollinger Band (BOLD) the market is thought to be oversold.

The more volatile the market is, the wider the upper and lower band will be. Hence, it also indicates how volatile the market is at a given period.

The volatility measured by the Bollinger Band is referred to as a squeeze when the upper and lower band are close. This is considered to be a sign that there will be more volatility in the coming future, which opens up for possible trading opportunities.

A common misconception of the bands are that when the price outbreaks the the bounds of the upper and lower band, it is a trading signal. This is not the case.

As with all trading indicators, it should not be used alone to make trading decisions.

Pandas: Calculate the Stochastic Oscillator Indicator for Stocks

What is the Stochastic Oscillator Indicator for a stock?

A stochastic oscillator is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time.

https://www.investopedia.com/terms/s/stochasticoscillator.asp

The stochastic oscillator is an indicator for the speed and momentum of the price. The indicator changes direction before the price does and is therefore a leading indicator.

Step 1: Get stock data to do the calculations on

In this tutorial we will use the Apple stock as example, which has ticker AAPL. You can change to any other stock of your interest by changing the ticker below. To find the ticker of your favorite company/stock you can use Yahoo! Finance ticker lookup.

To get some time series of stock data we will use the Pandas-datareader library to collect it from Yahoo! Finance.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("AAPL", dt.datetime(2020, 1, 1), dt.datetime.now())
print(ticker)

Where we only focus on data from 2020 until today.

                  High         Low  ...      Volume   Adj Close
Date                                ...                        
2020-01-02  300.600006  295.190002  ...  33870100.0  298.292145
2020-01-03  300.579987  296.500000  ...  36580700.0  295.392120
2020-01-06  299.959991  292.750000  ...  29596800.0  297.745880
2020-01-07  300.899994  297.480011  ...  27218000.0  296.345581
2020-01-08  304.440002  297.160004  ...  33019800.0  301.112640
...                ...         ...  ...         ...         ...
2020-08-05  441.570007  435.589996  ...  30498000.0  439.457642
2020-08-06  457.649994  439.190002  ...  50607200.0  454.790009
2020-08-07  454.700012  441.170013  ...  49453300.0  444.450012
2020-08-10  455.100006  440.000000  ...  53100900.0  450.910004
2020-08-11  449.929993  436.429993  ...  46871100.0  437.500000

[154 rows x 6 columns]

The output does not show all the columns, which are: High, Low, Open, Close, Volume, and Adj Close.

Step 2: Understand the calculation of Stochastic Oscillator Indicator

The Stochastic Oscillator Indicator consists of two values calculated as follows.

%K = (Last Close – Lowest low) / (Highest high – Lowest low)

%D = Simple Moving Average of %K

What %K looks at is the Lowest low and Highest high in a window of some days. The default is 14 days, but can be changed. I’ve seen others use 20 days, as the stock market is open 20 days per month. The original definition set it to 14 days. The simple moving average was set to 3 days.

The numbers are converted to percentage, hence the indicators are in the range of 0% to 100%.

The idea is that if the indicators are above 80%, it is considered to be in the overbought range. While if it is below 20%, then it is considered to be oversold.

Step 3: Calculate the Stochastic Oscillator Indicator

With the above description it is straight forward to do.

import pandas_datareader as pdr
import datetime as dt


ticker = pdr.get_data_yahoo("AAPL", dt.datetime(2020, 1, 1), dt.datetime.now())

ticker['14-high'] = ticker['High'].rolling(14).max()
ticker['14-low'] = ticker['Low'].rolling(14).min()
ticker['%K'] = (ticker['Close'] - ticker['14-low'])*100/(ticker['14-high'] - ticker['14-low'])
ticker['%D'] = ticker['%K'].rolling(3).mean()
print(ticker)

Resulting in the following output.

                  High         Low  ...         %K         %D
Date                                ...                      
2020-01-02  300.600006  295.190002  ...        NaN        NaN
2020-01-03  300.579987  296.500000  ...        NaN        NaN
2020-01-06  299.959991  292.750000  ...        NaN        NaN
2020-01-07  300.899994  297.480011  ...        NaN        NaN
2020-01-08  304.440002  297.160004  ...        NaN        NaN
...                ...         ...  ...        ...        ...
2020-08-05  441.570007  435.589996  ...  92.997680  90.741373
2020-08-06  457.649994  439.190002  ...  97.981589  94.069899
2020-08-07  454.700012  441.170013  ...  86.939764  92.639677
2020-08-10  455.100006  440.000000  ...  93.331365  92.750906
2020-08-11  449.929993  436.429993  ...  80.063330  86.778153

[154 rows x 10 columns]

Please notice that we have not included all columns here. Also, see the the %K and %D are not available for the first days, as it needs 14 days of data to be calculated.

Step 4: Plotting the data on a graph

We will combine two graphs in one. This can be easily obtained using Pandas DataFrames plot function. The argument secondary_y can be used to plot up against two y-axis.

The two lines %K and %D are both on the same scale 0-100, while the stock prices are on a different scale depending on the specific stock.

To keep things simple, we also want to plot a line indicator of the 80% high line and 20% low line. This can be done by using the axhline from the Axis object that plot returns.

The full code results in the following.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt


ticker = pdr.get_data_yahoo("AAPL", dt.datetime(2020, 1, 1), dt.datetime.now())

ticker['14-high'] = ticker['High'].rolling(14).max()
ticker['14-low'] = ticker['Low'].rolling(14).min()
ticker['%K'] = (ticker['Close'] - ticker['14-low'])*100/(ticker['14-high'] - ticker['14-low'])
ticker['%D'] = ticker['%K'].rolling(3).mean()

ax = ticker[['%K', '%D']].plot()
ticker['Adj Close'].plot(ax=ax, secondary_y=True)
ax.axhline(20, linestyle='--', color="r")
ax.axhline(80, linestyle="--", color="r")
plt.show()

Resulting in the following graph.

Apple with Stochastic Oscillator Indicator %K and %D

Step 5: Interpreting the signals.

First a word of warning. Most advice from only using one indicator alone as a buy-sell signal. This also holds for the Stochastic Oscillator indicator. As the name suggest, it is only an indicator, not a predictor.

The indicator signals buy or sell when the two lines crosses each other. If the %K is above the %D then it signals buy and when it crosses below, it signals sell.

Looking at the graph it makes a lot of signals (every time the two lines crosses each other). This is a good reason to have other indicators to rely on.

An often misconception is that it should only be used when it is in the regions of 20% low or 80% high. But it is often that low and high can be for quite some time. Hence, selling if we reach the 80% high in this case, we would miss a great opportunity of a big gain.

Visualize Inflation for 2019 using Pandas-datareader and GeoPandas

What will we cover in this tutorial?

In this tutorial we will visualize the inflation on a map. This will be done by getting the inflation data directly from World Bank using the Pandas-datareader. This data will be joined with data from GeoPandas, which provides a world map we can use to create a Choropleth map.

The end result

Step 1: Retrieve the inflation data from World Bank

The Pandas-datareader has an interface to get data from World Bank. To find interesting data from World Bank you should explore data.worldbank.org, which contains various interesting indicators.

When you find one, like the Inflation, consumer prices (annual %), we will use, you can see that you can download it in CSV, XML, or excel. But we are not old fashioned, hence, we will use the direct API to get fresh data every time we run our program.

To use the API, we need the indicator, which you will find in the url. In this case.

https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG

Hence we have it FP.CPI.TOTL.ZG.

Using the Pandas-datareader API you can get the data by running the following piece of code.

from pandas_datareader import wb

data = wb.download(indicator='FP.CPI.TOTL.ZG', country='all', start=2019, end=2019)
print(data)

If you inspect the output, you will see it is structured a bit inconvenient.

                                                         FP.CPI.TOTL.ZG
country                                            year                
Arab World                                         2019        1.336016
Caribbean small states                             2019             NaN
Central Europe and the Baltics                     2019        2.664561
Early-demographic dividend                         2019        3.030587
East Asia &amp; Pacific                                2019        1.773102
East Asia &amp; Pacific (excluding high income)        2019        2.779172
East Asia &amp; Pacific (IDA &amp; IBRD countries)         2019        2.779172

It has two indexes.

We want to reset index 1 (the year) and, which will make year to a column. Then for convenience we should rename the columns.

from pandas_datareader import wb

data = wb.download(indicator='FP.CPI.TOTL.ZG', country='all', start=2019, end=2019)
data = data.reset_index(1)
data.columns = ['year', 'inflation']
print(data)

Resulting in the following.

                                                    year  inflation
country                                                            
Arab World                                          2019   1.336016
Caribbean small states                              2019        NaN
Central Europe and the Baltics                      2019   2.664561
Early-demographic dividend                          2019   3.030587
East Asia &amp; Pacific                                 2019   1.773102
East Asia &amp; Pacific (excluding high income)         2019   2.779172
East Asia &amp; Pacific (IDA &amp; IBRD countries)          2019   2.779172

Step 2: Retrieve the world map data

The world map data is available from GeoPandas. At first glance everything is easy.

import geopandas

map = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
map = map[map['name'] != 'Antarctica']
print(map)

Where I excluded Antarctica for visual purposes. Inspecting some of the output.

        pop_est                continent                      name iso_a3   gdp_md_est                                           geometry
0        920938                  Oceania                      Fiji    FJI      8374.00  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1      53950935                   Africa                  Tanzania    TZA    150600.00  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2        603253                   Africa                 W. Sahara    ESH       906.50  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3      35623680            North America                    Canada    CAN   1674000.00  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4     326625791            North America  United States of America    USA  18560000.00  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
5      18556698                     Asia                Kazakhstan    KAZ    460700.00  POLYGON ((87.35997 49.21498, 86.59878 48.54918...
6      29748859                     Asia                Uzbekistan    UZB    202300.00  POLYGON ((55.96819 41.30864, 55.92892 44.99586...

It seems to be a good match to join the data on the name column.

To make it easy, we can make the name column index.

import geopandas

map = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
map = map[map['name'] != 'Antarctica']
map = map.set_index('name')

Step 3: Joining the datasets

This is the fun part of Data Science. Why? I am glad you asked. Well, it was an irony. The challenge will be apparent in a moment. There are various ways to deal with it, but in this tutorial we will use a simplistic approach.

Let us do the join.

from pandas_datareader import wb
import geopandas

pd.set_option('display.width', 3000)
pd.set_option('display.max_columns', 300)
pd.set_option('display.max_rows', 500)

map = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
map = map[map['name'] != 'Antarctica']
map = map.set_index('name')

data = wb.download(indicator='FP.CPI.TOTL.ZG', country='all', start=2019, end=2019)
data = data.reset_index(1)
data.columns = ['year', 'inflation']

map = map.join(data, how='outer')
print(map)

Where I use an outer join, to get all the “challenges” visible.

Russia                                              1.422575e+08                   Europe    RUS   3745000.00  MULTIPOLYGON (((178.72530 71.09880, 180.00000 ...   NaN        NaN
Russian Federation                                           NaN                      NaN    NaN          NaN                                               None  2019   4.470367
...
United States                                                NaN                      NaN    NaN          NaN                                               None  2019   1.812210
United States of America                            3.266258e+08            North America    USA  18560000.00  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...   NaN        NaN

Where I only took two snippets. The key thing is here, that the data from GeoPandas, containing the map, and data from World Bank, containing the inflation rates we want to color the map with, are not joined.

Hence, we need to join United States together with United States of America. And Russia with Russian Federation.

We would use a location service, which maps counties to country codes. Hence, mapping each data sets country names to country codes (note that GeoPandas already has 3 letter country codes, but some are missing, like Norway and more). This approach still can have some missing pieces, as some country names are not known by the mapping.

Another approach is to look find all the data not mapped and rename them in one of the datasets. This can take some time, but I did most of them in the following.

from pandas_datareader import wb
import geopandas

map = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
map = map[map['name'] != 'Antarctica']
map = map.set_index('name')
index_change = {
    'United States of America': 'United States',
    'Yemen': 'Yemen, Rep.',
    'Venezuela': 'Venezuela, RB',
    'Syria': 'Syrian Arab Republic',
    'Solomon Is.': 'Solomon Islands',
    'Russia': 'Russian Federation',
    'Iran': 'Iran, Islamic Rep.',
    'Gambia': 'Gambia, The',
    'Kyrgyzstan': 'Kyrgyz Republic',
    'Mauritania': 'Mauritius',
    'Egypt': 'Egypt, Arab Rep.'
}
map = map.rename(index=index_change)

data = wb.download(indicator='FP.CPI.TOTL.ZG', country='all', start=2019, end=2019)
data = data.reset_index(1)
data.columns = ['year', 'inflation']

map = map.join(data, how='outer')

Step 4: Making a Choropleth map based on our dataset

The simple plot of the data will not be very insightful. But let’s try that first.

map.plot('inflation')
plt.title("Inflation 2019")
plt.show()

Resulting in the following.

The default result.

A good way to get inspiration is to check out the documentation with examples.

From the GeoPandas documentation

Where you see a cool color map with scheme=’quantiles’. Let’s try that.

map.plot('inflation', cmap='OrRd', scheme='quantiles')
plt.title("Inflation 2019")
plt.show()

Resulting in the following.

Closer

Adding grey tone to countries not mapped, adding a legend, setting the size. Then we are done. The full source code is here.

from pandas_datareader import wb
import geopandas
import pandas as pd
import matplotlib.pyplot as plt

map = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
map = map[map['name'] != 'Antarctica']
map = map.set_index('name')
index_change = {
    'United States of America': 'United States',
    'Yemen': 'Yemen, Rep.',
    'Venezuela': 'Venezuela, RB',
    'Syria': 'Syrian Arab Republic',
    'Solomon Is.': 'Solomon Islands',
    'Russia': 'Russian Federation',
    'Iran': 'Iran, Islamic Rep.',
    'Gambia': 'Gambia, The',
    'Kyrgyzstan': 'Kyrgyz Republic',
    'Mauritania': 'Mauritius',
    'Egypt': 'Egypt, Arab Rep.'
}
map = map.rename(index=index_change)

data = wb.download(indicator='FP.CPI.TOTL.ZG', country='all', start=2019, end=2019)
data = data.reset_index(1)
data.columns = ['year', 'inflation']

map = map.join(data, how='outer')

map.plot('inflation', cmap='OrRd', scheme='quantiles', missing_kwds={"color": "lightgrey"}, legend=True, figsize=(14,5))
plt.title("Inflation 2019")
plt.show()

Resulting in the following output.

Inflation data from World Bank mapped on a Choropleth map using GeoPandas and MatPlotLib.

Pandas: Does Stock Market Correlate to Unemployment Rate or Bank Interest Rate?

What will we cover in this tutorial?

We will continue our exploration of the amazing Pandas-datareader. In this tutorial we will further investigate data from World Bank and correlate it with S&P 500 Stock index. We will do this both by visualizing 3 graphs on 2 different y-axis as well as compute the correlation.

Step 1: Get World Bank data

In this tutorial we will only look at data from United States. If you are interested in other tutorials on. World Bank data you should read this one and this one.

To get the data of the World Bank you can use the Pandas-datareader, which has a function to download data if you have the indicator.

pandas_datareader.wb.download(country=Noneindicator=Nonestart=2003end=2005freq=Noneerrors=’warn’**kwargs)

That takes the country and indicator, start, and end year as arguments.

You can find indicators on the webpage of World Bank.

In this tutorial we will use the SL.UEM.TOTL.ZS, the unemployment, total (% of total labor force), and FR.INR.RINR, the interest rate.

To inspect the data you can use the following code.

from pandas_datareader import wb


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)

print(data)

Giving an output similar to this (lines excluded).

        US-int  US-unempl
year                     
1990  6.039744        NaN
1991  4.915352      6.800
1992  3.884240      7.500
1993  3.546689      6.900
1994  4.898356      6.119
1995  6.594069      5.650
1996  6.324008      5.451
1997  6.603407      5.000
1998  7.148192      4.510
1999  6.457135      4.219

For details on the unstacking and transposing, see this tutorial.

Step 2: Join the data from the S&P 500 index

First let’s get the data from S&P 500, which has ticker ^GSPC.

You can use the Pandas-datareader for that.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&amp;P 500'
print(sp500)

Resulting in the following output.

Date
1990-01-02     359.690002
1990-01-03     358.760010
1990-01-04     355.670013
1990-01-05     352.200012
1990-01-08     353.790009
                 ...     
2019-12-24    3223.379883
2019-12-26    3239.909912
2019-12-27    3240.020020
2019-12-30    3221.290039
2019-12-31    3230.780029
Name: S&amp;P 500, Length: 7559, dtype: float64

Problem! The date is a datetime object in the above Series, while it is a string with a year in the DataFrame with unemployment rate and interest rate above.

To successfully join them, we need to convert them into same format. The best way is to convert them into a datetime. We can do that by using the pd.to_datetime() function.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from pandas_datareader import wb


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&amp;P 500'

data = sp500.to_frame().join(data, how='outer')
print(data)

Resulting in the following output.

                S&amp;P 500    US-int  US-unempl
1990-01-01          NaN  6.039744        NaN
1990-01-02   359.690002       NaN        NaN
1990-01-03   358.760010       NaN        NaN
1990-01-04   355.670013       NaN        NaN
1990-01-05   352.200012       NaN        NaN
...                 ...       ...        ...
2019-12-24  3223.379883       NaN        NaN
2019-12-26  3239.909912       NaN        NaN
2019-12-27  3240.020020       NaN        NaN
2019-12-30  3221.290039       NaN        NaN
2019-12-31  3230.780029       NaN        NaN

The problem you see there, is that data from US-int and US-unempl. only has data the first of January every year. To fix that, we can make a linear interpolation of the data by applying the following.

data = data.interpolate(method='linear')

Resulting in.

                S&amp;P 500    US-int  US-unempl
1990-01-01          NaN  6.039744        NaN
1990-01-02   359.690002  6.035318        NaN
1990-01-03   358.760010  6.030891        NaN
1990-01-04   355.670013  6.026464        NaN
1990-01-05   352.200012  6.022037        NaN
...                 ...       ...        ...
2019-12-24  3223.379883  3.478200      3.682
2019-12-26  3239.909912  3.478200      3.682
2019-12-27  3240.020020  3.478200      3.682
2019-12-30  3221.290039  3.478200      3.682
2019-12-31  3230.780029  3.478200      3.682

Notice, that since there is no unemployment data for 1990 in US, it will fill them with NaN until first rate is given.

Step 3: Visualize all three graphs with 3 different y-axis

Now here Pandas are quite strong. By default, you can create a secondary y-axis. As the 3 datasets only need two y-axis, as the unemployment and interest rate can share the same y-axis.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)


data = wb.download(indicator=['SL.UEM.TOTL.ZS', 'FR.INR.RINR'], country=['US'], start=1990, end=2019)

uem_data = data.unstack().T.loc['SL.UEM.TOTL.ZS']
uem_data.columns = ['US-unempl']
int_data = data.unstack().T.loc['FR.INR.RINR']
int_data.columns = ['US-int']

data = int_data.join(uem_data)
data.index = pd.to_datetime(data.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
sp500.name='S&amp;P 500'

data = sp500.to_frame().join(data, how='outer')
data = data.interpolate(method='linear')

ax = data['S&amp;P 500'].plot(legend=True)
ax = data[['US-int','US-unempl']].plot(ax=ax, secondary_y=True, legend=True)

print(data.corr())

plt.show()

Where the correlation is given here.

            S&amp;P 500    US-int  US-unempl
S&amp;P 500    1.000000 -0.408429  -0.453315
US-int    -0.408429  1.000000  -0.470103
US-unempl -0.453315 -0.470103   1.000000

Which is surprisingly low. Visually, you can see it here.

The S&P 500 stock index, US interest rate and US unemployment rate

Pandas: Determine Correlation Between GDP and Stock Market

What will we cover in this tutorial?

In this tutorial we will explore some aspects of the Pandas-Datareader, which is an invaluable way to get data from many sources, including the World Bank and Yahoo! Finance.

In this tutorial we will investigate if the GDP of a country is correlated to the stock market.

Step 1: Get GDP data from World Bank

In the previous tutorial we looked at the GDP per capita and compared it between countries. GDP per capita is a good way to compare country’s economy between each other.

In this tutorial we will look at the GDP and using the NY.GDP.MKTP.CD indicator of GDP in current US$.

We can extract the data by using using the download function from the Pandas-datareader library.

from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)

print(gdp)

Resulting in the following output.

                    NY.GDP.MKTP.CD
country       year                
United States 2019  21427700000000
              2018  20580223000000
              2017  19485393853000
              2016  18707188235000
              2015  18219297584000
              2014  17521746534000
              2013  16784849190000
              2012  16197007349000
              2011  15542581104000

Step 2: Gathering the stock index

Then we need to gather the data from the stock market. As we look at the US stock market, the S&P 500 index is a good indicator of the market.

The ticker of S&P 500 is ^GSPC (yes, with the ^).

The Yahoo! Finance api is a great place to collect this type of data.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']
print(sp500)

Resulting in the following output.

Date
1990-01-02     359.690002
1990-01-03     358.760010
1990-01-04     355.670013
1990-01-05     352.200012
1990-01-08     353.790009
                 ...     
2019-12-24    3223.379883
2019-12-26    3239.909912
2019-12-27    3240.020020
2019-12-30    3221.290039
2019-12-31    3230.780029

Step 3: Visualizing the data on one plot

A good way to see if there is a correlation is simply by visualizing it.

This can be done with a few tweaks.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country='US', start=1990, end=2019)

gdp = gdp.unstack().T.reset_index(0)
gdp.index = pd.to_datetime(gdp.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']


data = sp500.to_frame().join(gdp, how='outer')
data = data.interpolate(method='linear')

ax = data['Adj Close'].plot()
ax = data['United States'].plot(ax=ax, secondary_y=True)

plt.show()

The GDP data needs to be formatted differently, by unstack’ing, transposing, and resetting the index. Then the index is converted from being strings of year to actually time series.

We use a outer join to get all the dates in the time series. Then we interpolate with a linear method to fill out the gab in the graph.

Finally, we make a plot af Adj Close of S&P 500 stock index and on of the GDP of United States, where we use the same graph, but using the secondary y-axis to plot. That means, the time series on the x-axis is the same.

The resulting graph is.

US GDP with S&P 500 index

It could look like a correlation, which is visible in the aftermath of 2008.

Step 4: Calculate a correlation

Let’s try to make some correlation calculations.

First, let’s not just rely on how US GDP correlates to the US stock market. Let us try to relate it to other countries GDP and see how they relate to the strongest economy in the world.

import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import wb


gdp = wb.download(indicator='NY.GDP.MKTP.CD', country=['NO', 'FR', 'US', 'GB', 'DK', 'DE', 'SE'], start=1990, end=2019)

gdp = gdp.unstack().T.reset_index(0)
gdp.index = pd.to_datetime(gdp.index, format='%Y')


start = dt.datetime(1990, 1, 1)
end = dt.datetime(2019, 12, 31)
sp500 = pdr.get_data_yahoo("^GSPC", start, end)['Adj Close']

data = sp500.to_frame().join(gdp, how='outer')
data = data.interpolate(method='linear')

print(data.corr())

Where we compare it the the GDP for some more countries to verify our hypothesis.

                Adj Close   Denmark    France   Germany    Norway    Sweden  United Kingdom  United States
Adj Close        1.000000  0.729701  0.674506  0.727289  0.653507  0.718829        0.759239       0.914303
Denmark          0.729701  1.000000  0.996500  0.986769  0.975780  0.978550        0.955674       0.926139
France           0.674506  0.996500  1.000000  0.982225  0.979767  0.974825        0.945877       0.893780
Germany          0.727289  0.986769  0.982225  1.000000  0.953131  0.972542        0.913443       0.916239
Norway           0.653507  0.975780  0.979767  0.953131  1.000000  0.978784        0.933795       0.878704
Sweden           0.718829  0.978550  0.974825  0.972542  0.978784  1.000000        0.930621       0.916530
United Kingdom   0.759239  0.955674  0.945877  0.913443  0.933795  0.930621        1.000000       0.915859
United States    0.914303  0.926139  0.893780  0.916239  0.878704  0.916530        0.915859       1.000000

Now that is interesting. The US Stock market (Adj Close) correlates the strongest with the US GDP. Not surprising.

Of the chosen countries, the Danish GDP is the second most correlated to US stock market. The GDP of the countries correlate all strongly with the US GDP. There Norway correlates the least.

Continue the exploration of World Bank data.