We will get financial data from FPM cloud (Free stock API) for the last years and generate a 3 sheet Excel sheet with charts similar to this. All done from Python.
In this tutorial we are only going to use the example data on Apple, that is free available without registering on FMP Cloud. If you want to do it on another stock yo will need to register on their site.
What we need is the income statement and cash flowstatement. They are available as JSON on their page (income statement and cash flow statement).
Notice that we set the index to be the date column. This makes the further work easier.
Step 2: Prepare the data
The next step we need to do is to make the necessary calculations and prepare the data.
We are only interested Revenue, Earnings Per Share (EPS) and Free Cash Flow (FCF). So let’s take that data and keep it in a DataFrame (The Pandas main data structure).
data = income_statement[['revenue', 'eps']].copy()
data['fcf'] = cash_flow['freeCashFlow']
The details of creating the charts in Excel can be found on XlsxWriter. Basically, it is sending informations to the XlsxWriter engine through dictionaries. You need to send the values that you would set in Excel, if you were working inside there.
Again, the result will be in the Excel document financial.xlsx.
In this tutorial we will read a historic stock prices, calculate the moving average and export that to an Excel sheet and insert a chart with prices and moving average. And all will be done from Python using Pandas and Pandas-datareader.
Step 1: Get the historic stock prices
A great place to get historic stock prices is using Pandas-datareader. It has a great interface to various places to get the data. In this tutorial we will use Yahoo API through the Pandas-datareader.
It doesn’t require any registration to use the API. It works straight out of the box.
To get stock prices in time series you need to find the ticker of your favorite stock. In this tutorial we will use Apple, which has ticker AAPL.
import pandas_datareader as pdr
import datetime as dt
start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)
print(ticker.head())
You also need to set the date to set how far back you want historic stock prices. By default, you will get up to the most current date. Also, you will data for each day open. You can use more arguments to configure what if you want weekly or monthly prices. Also, you can set a end-date, if you like.
The above code should give output similar to the following.
High Low Open Close Volume Adj Close
Date
2020-01-02 75.150002 73.797501 74.059998 75.087502 135480400.0 73.840042
2020-01-03 75.144997 74.125000 74.287498 74.357498 146322800.0 73.122154
2020-01-06 74.989998 73.187500 73.447502 74.949997 118387200.0 73.704819
2020-01-07 75.224998 74.370003 74.959999 74.597504 108872000.0 73.358185
2020-01-08 76.110001 74.290001 74.290001 75.797501 132079200.0 74.538239
Step 2: Calculate the Moving Average
To calculate the moving average (also called the simple moving average), we can use the rolling method on a DataFrame.
The rolling method takes one argument, which is the window size. This indicates how big a window we want to apply a function on. In this case we want to apply the mean function on a window of size 50.
import pandas_datareader as pdr
import pandas as pd
import datetime as dt
start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)
ticker['MA'] = ticker['Close'].rolling(50).mean()
print(ticker[50:].head())
This calculates the simple moving average of window size 50. This will give the same result as the moving average with the default window of 50 will give in Yahoo! Finance.
The output will be as follows.
High Low Open ... Volume Adj Close MA
Date ...
2020-03-16 64.769997 60.000000 60.487499 ... 322423600.0 59.687832 76.16100
2020-03-17 64.402496 59.599998 61.877499 ... 324056000.0 62.312309 75.93815
2020-03-18 62.500000 59.279999 59.942501 ... 300233600.0 60.786911 75.67250
2020-03-19 63.209999 60.652500 61.847500 ... 271857200.0 60.321156 75.40445
2020-03-20 62.957500 57.000000 61.794998 ... 401693200.0 56.491634 75.03470
Notice we removed the first 50 rows (actually 51, as we index from 0). This is because the MA (moving average) column will not have numbers before we reach here.
Step 3: Export data to Excel and create a chart with close prices and moving average
Now this is where it all get’s a bit more complicated. It takes some reading in the manual to figure all this out.
The code is commented to explain what happens.
import pandas_datareader as pdr
import pandas as pd
import datetime as dt
# Read the stock prices from Yahoo! Finance
start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)
# Calculate the moving average with window size 50
ticker['MA'] = ticker['Close'].rolling(50).mean()
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('apple.xlsx', engine='xlsxwriter')
# Name the sheet
sheet_name = "Apple"
# We convert the index from datetime to date
# This makes the data in Excel only have the date and
# not the date with time: 00:00:00:0000
ticker.index = ticker.index.date
# Skip the first 51 rows (to be pedantic, 49 is the correct number)
ticker = ticker[50:]
# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name=sheet_name)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets[sheet_name]
# Widen the first column to display the dates.
worksheet.set_column('A:A', 12)
# Get the number of rows and column index
max_row = len(ticker)
col_ma = ticker.columns.get_loc('MA') + 1
col_close = ticker.columns.get_loc('Close') + 1
# Create a chart object of type line
chart = workbook.add_chart({'type': 'line'})
# Insert the first dataset into chart
chart.add_series({
'name': "MA",
'categories': [sheet_name, 1, 0, max_row, 0],
'values': [sheet_name, 1, col_ma, max_row, col_ma],
})
# Insert the second dataset in the same chart
chart.add_series({
'name': "Close",
'values': [sheet_name, 1, col_close, max_row, col_close],
})
# Configure axis
chart.set_x_axis({
'name': 'Date',
'date_axis': True,
})
chart.set_y_axis({
'name': '$',
'major_gridlines': {'visible': False}
})
# Insert the chart into the worksheet.
worksheet.insert_chart('I2', chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
The above code will create an Excel sheet looking like this.
What is Markowitz Portfolios Optimization (Efficient Frontier)?
The Efficient Frontier takes a portfolio of investments and optimizes the expected return in regards to the risk. That is to find the optimal return for a risk.
It will contain all the date time series for the last 5 years from current date.
Step 2: Calculate the CAGR, returns, and covariance
To calculate the expected return, we use the Compound Average Growth Rate (CAGR) based on the last 5 years. The CAGR is used as investopedia suggest. An alternative that also is being used is the mean of the returns. The key thing is to have some common measure of the return.
The CAGR is calculated as follows.
CAGR = (end-price/start-price)^(1/years) – 1
We will also calculate the covariance as we will use that the calculate the variance of a weighted portfolio. Remember that the standard deviation is given by the following.
sigma = sqrt(variance)
A portfolio is a vector w with the balances of each stock. For example, given w = [0.2, 0.3, 0.4, 0.1], will say that we have 20% in the first stock, 30% in the second, 40% in the third, and 10% in the final stock. It all sums up to 100%.
This is where the power of computing comes into the picture. The idea is to just try a random portfolio and see how it rates with regards to expected return and risk.
It is that simple. Make a random weighted distribution of your portfolio and plot the point of expected return (based on our CAGR) and the risk based on the standard deviation calculated by the covariance.
import matplotlib.pyplot as plt
import numpy as np
def random_weights(n):
k = np.random.rand(n)
return k / sum(k)
exp_return = []
sigma = []
for _ in range(20000):
w = random_weights(len(tickers))
exp_return.append(np.dot(w, cagr.T))
sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()
We introduce a helper function random_weights, which returns a weighted portfolio. That is, it returns a vector with entries that sum up to one. This will give a way to distribute our portfolio of stocks.
Then we iterate 20.000 times (could be any value, just want to have enough to plot our graph), where we make a random weight w, then calculate the expected return by the dot-product of w and cagr-transposed. This is done by using NumPy’s dot-product function.
What a dot-product of np.dot(w, cagr.T) does is to take elements pairwise from w and cagr and multiply them and sum up. The transpose is only about the orientation of it to make it work.
The standard deviation (assigned to sigma) is calculated similar by the formula given in the last step: variance = w^T Cov w (which has dot-products between).
This results in the following graph.
Returns vs risks
This shows a graph which outlines a parabola. The optimal values lie along the upper half of the parabola line. Hence, given a risk, the optimal portfolio is one corresponding on the upper boarder of the filled parabola.
Considerations
The Efficient Frontier gives you a way to balance your portfolio. The above code can by trial an error find such a portfolio, but it still leaves out some consideratoins.
How often should you re-balance? It has a cost to do that.
The theory behind has some assumptions that may not be a reality. As investopedia points out, it assumes that asset returns follow a normal distribution, but in reality returns can be more the 3 standard deviations away. Also, the theory builds upon that investors are rational in their investment, which is by most considered a flawed assumption, as more factors play into the investments.
The full source code
Below here you find the full source code from the tutorial.
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import numpy as np
years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL', 'MSFT', 'IBM', 'NVDA']
for ticker in tickers:
tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
close_price[ticker] = tmp['Close']
returns = close_price / close_price.shift(1)
cagr = (close_price.iloc[-1] / close_price.iloc[0]) ** (1 / years) - 1
cov = returns.cov()
def random_weights(n):
k = np.random.rand(n)
return k / sum(k)
exp_return = []
sigma = []
for _ in range(20000):
w = random_weights(len(tickers))
exp_return.append(np.dot(w, cagr.T))
sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()
A Forest Classifier is an approach to minimize the heavy bias a Decision Tree can get. A forest classifier simply contains a set of decision trees and uses majority voting to make the prediction.
In this tutorial we will try to use that on the stock market, by creating a few indicators. This tutorial will give a framework to explore if it can predict the direction of a stock. Given a set of indicators, will the stock go up or down the next trading day.
This is a simplified problem of predicting the actual stock value the next day.
Step 1: Getting data and calculate some indicators
If you are new to stock indicators, we can highly recommend you to read about the MACD, RSI, Stochastic Oscillator, where the MACD also includes how to calculate the EMA. Here we assume familiarity to those indicators. Also, that you are familiar with Pandas DataFrames and Pandad-datareader.
import pandas_datareader as pdr
import datetime as dt
import numpy as np
ticker = "^GSPC" # The S&P 500 index
data = pdr.get_data_yahoo(ticker, dt.datetime(2010,1,1), dt.datetime.now(), interval='d')
# Calculate the EMA10 > EMA30 signal
ema10 = data['Close'].ewm(span=10).mean()
ema30 = data['Close'].ewm(span=30).mean()
data['EMA10gtEMA30'] = np.where(ema10 > ema30, 1, -1)
# Calculate where Close is > EMA10
data['ClGtEMA10'] = np.where(data['Close'] > ema10, 1, -1)
# Calculate the MACD signal
exp1 = data['Close'].ewm(span=12).mean()
exp2 = data['Close'].ewm(span=26).mean()
macd = exp1 - exp2
macd_signal = macd.ewm(span=9).mean()
data['MACD'] = macd_signal - macd
# Calculate RSI
delta = data['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
data['RSI'] = 100 - (100/(1 + rs))
# Stochastic Oscillator
high14= data['High'].rolling(14).max()
low14 = data['Low'].rolling(14).min()
data['%K'] = (data['Close'] - low14)*100/(high14 - low14)
# Williams Percentage Range
data['%R'] = -100*(high14 - data['Close'])/(high14 - low14)
days = 6
# Price Rate of Change
ct_n = data['Close'].shift(days)
data['PROC'] = (data['Close'] - ct_n)/ct_n
print(data)
The choice of indicators is arbitrary but among some popular ones. It should be up to you to change them to other indicators and experiment with them.
Step 2: Understand the how the Decision Tree works
Trees are the foundation in the Forest. Or Decision Trees are the foundation in a Forest Classifier. Hence, it is a good starting point to understand how a Decision Tree works. Luckily, they are quite easy to understand.
Let’s try to investigate a Decision Tree that is based on two of the indicators above. We take the RSI (Relative Strength Index) and %K (Stochastic Oscillator). A Decision Tree could look like this (depending on the training data).
Decision Tree for %K and RSI
When we get a new data row with %K and RSI indicators, it will start at the top of the Decision Tree.
At the first node it will check if %K <= 4.615, if so, take the left child otherwise the right child.
The gini tells us how a randomly chosen element would be incorrectly labeled. Hence, a low value close to 0 is good.
Samples tells us how many of the samples of the training set reached this node.
Finally, the value tells us how the values are distributed. In the final decision nodes, the category of most element is the prediction.
Looking at the above Decision Tree, it does not seem to be very good. The majority of samples end up the fifth node with a gini on 0.498, close to random, right? And it will label it 1, growth.
But this is the idea with Forest Classifiers, it will take a bunch of Decision Trees, that might not be good, and use majority of them to classify it.
Step 3: Create the Forest Classifier
Now we understand how the Decision Tree and the Forest Classifier work, we just need to run the magic. As this is done by calling a library function.
import pandas_datareader as pdr
import datetime as dt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import RandomForestClassifier
ticker = "^GSPC"
data = pdr.get_data_yahoo(ticker, dt.datetime(2010,1,1), dt.datetime.now(), interval='d')
# Calculate the EMA10 > EMA30 signal
ema10 = data['Close'].ewm(span=10).mean()
ema30 = data['Close'].ewm(span=30).mean()
data['EMA10gtEMA30'] = np.where(ema10 > ema30, 1, -1)
# Calculate where Close is > EMA10
data['ClGtEMA10'] = np.where(data['Close'] > ema10, 1, -1)
# Calculate the MACD signal
exp1 = data['Close'].ewm(span=12).mean()
exp2 = data['Close'].ewm(span=26).mean()
macd = exp1 - exp2
macd_signal = macd.ewm(span=9).mean()
data['MACD'] = macd_signal - macd
# Calculate RSI
delta = data['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
data['RSI'] = 100 - (100/(1 + rs))
# Stochastic Oscillator
high14= data['High'].rolling(14).max()
low14 = data['Low'].rolling(14).min()
data['%K'] = (data['Close'] - low14)*100/(high14 - low14)
# Williams Percentage Range
data['%R'] = -100*(high14 - data['Close'])/(high14 - low14)
days = 6
# Price Rate of Change
ct_n = data['Close'].shift(days)
data['PROC'] = (data['Close'] - ct_n)/ct_n
# Set class labels to classify
data['Return'] = data['Close'].pct_change(1).shift(-1)
data['class'] = np.where(data['Return'] > 0, 1, 0)
# Clean for NAN rows
data = data.dropna()
# Minimize dataset
data = data.iloc[-200:]
# Data to predict
predictors = ['EMA10gtEMA30', 'ClGtEMA10', 'MACD', 'RSI', '%K', '%R', 'PROC']
X = data[predictors]
y = data['class']
# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
# Train the model
rfc = RandomForestClassifier(random_state=0)
rfc = rfc.fit(X_train, y_train)
# Test the model by doing some predictions
y_pred = rfc.predict(X_test)
# See how accurate the predictions are
report = classification_report(y_test, y_pred)
print('Model accuracy', accuracy_score(y_test, y_pred, normalize=True))
print(report)
First some notes on a few lines. The train_test_split, divides the data into training set and test set. The test set is set to be 30% of the data. It does it in a randomized way.
The model accuracy is 0.63, which seems quite good. It is better than random, at least. You can also see that the precision of 1 (growth) is higher than 0 (loss, or negative growth), with 0.66 and 0.56, respectively.
Does that mean it is all good and we can beat the market?
No, far from. Also, notice I chose to only use the last 200 stock days in my experiment out of the 2.500+ possible stock days.
Running a few experiments it showed that it the prediction was close to 50% if all days were used. That means, basically it was not possible to predict.
Step 4: A few more tests on stocks
I have run a few experiments on different stocks and also varying the number of days used.
Stock
100 days
200 Days
400 Days
S&P 500
0.53
0.63
0.52
AAPL
0.53
0.62
0.54
F
0.67
0.57
0.54
KO
0.47
0.52
0.53
IBM
0.57
0.52
0.57
MSFT
0.50
0.50
0.48
AMZN
0.57
0.47
0.58
TSLA
0.50
0.60
0.53
NVDA
0.57
0.53
0.54
The accuracy
Looking in the above table I am not convinced about my hypotheses. First, the 200 days to be better, might have be specific on the stock. Also, if you re-run tests you get new numbers, as the training and test dataset are different from time to time.
I did try a few with the full dataset, and I still think it performed worse (all close to 0.50).
The above looks fine, as it mostly can predict better than just guessing. But still there are a few cases where it is not the case.
Next steps
A few things to remember here.
Firstly, the indicators are chose at random from among the common ones. A further investigation on this could be an idea. It can highly bias the results if it is used does not help the prediction.
Secondly, I might have falsely hypothesized that it was more accurate when we limited to data to a smaller set than the original set.
Thirdly, it could be that the stocks are also having a bias in one direction. If we limit to a smaller period, a bull market will primarily have growth days, hence a biased guess on growth will be better than 0.50. This factor should be investigated further, to see if this favors the predictions.
Yes, you can do it manually. Copy from an HTML table and paste into an Excel spread sheet. Or you can dive into how to pull data directly from the internet into Excel. Sometimes it is not convenient, as some data needs to be transformed and you need to do it often.
In this tutorial we will show how this can be easily automated with Python using Pandas.
That is we go from data that needs to be transformed, like, $102,000 into 102000. Also, how to join (or merge) different datasources before we create a Excel spread sheet.
Step 1: The first data source: Revenue of Microsoft
There are many sources where you can get this data, but Macrotrends has it nicely in a table and for more than 10 years old data.
First thing first, let’s try to take a look at the data. You can use Pandas read_html to get the data from the tables given a URL.
Where we know it is in the first table on the page. A first few lines of the output is given here.
Microsoft Annual Revenue(Millions of US $) Microsoft Annual Revenue(Millions of US $).1
0 2020 $143,015
1 2019 $125,843
2 2018 $110,360
3 2017 $96,571
4 2016 $91,154
First thing to manage are the column names and setting the year to the index.
That helped. But then we need to convert the Revenue column to integers. This is a bit tricky and can be done in various ways. We first need to remove the $-sign, then the comma-sign, before we convert it.
Where we also reorder it, to have it from the early ears in the top. Notice the copy(), which is not strictly necessary, but makes a hard-copy of the data and not just a view.
Which will result in an Excel spread sheet called Output.xlsx.
The Excel spread sheet. I added the graph.
There are many things you might find easier in Excel, like playing around with different types of visualization. On the other hand, there might be many aspects you find easier in Python. I know, I do. Almost all of them. Not kidding. Still, Excel is a powerful tool which is utilized by many specialists. Still it seems like the skills of Python are in request in connection with Excel.
A key element to success in trading is to understand the market and the trend of the stock before you buy it. In this tutorial we will not cover how to read the market, but take a top-down analysis approach to stock prices. We will use what is called Multiple Time Frame Analysis on a stock starting with a 1-month, 1-week, and 1-day perspective. Finally, we will compare that with a Simple Moving Average with a monthly view.
Step 1: Gather the data with different time frames
We will use the Pandas-datareader library to collect the time series of a stock. The library has an endpoint to read data from Yahoo! Finance, which we will use as it does not require registration and can deliver the data we need.
import pandas_datareader as pdr
import datetime as dt
ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')
Where the key is to set the interval to ‘d’ (Day), ‘wk’ (Week), and ‘mo’ (Month).
This will give us 3 DataFrames, each indexed with different intervals.
Step 2: Combine data and interpolate missing points
The challenge to connect the DataFrames is that they have different index entries. If we add the data points from Daily with Weekly, there will be a lot of missing entries that Daily has, but Weekly does not have.
day week
Date
2019-01-02 101.120003 NaN
2019-01-03 97.400002 NaN
2019-01-04 101.930000 NaN
2019-01-07 102.059998 NaN
2019-01-08 102.800003 102.050003
... ... ...
2020-08-13 208.699997 NaN
2020-08-14 208.899994 NaN
2020-08-17 210.279999 NaN
2020-08-18 211.490005 209.699997
2020-08-19 209.699997 209.699997
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')
data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='linear')
print(data)
Which results in the following output.
day week
Date
2019-01-02 101.120003 NaN
2019-01-03 97.400002 NaN
2019-01-04 101.930000 NaN
2019-01-07 102.059998 NaN
2019-01-08 102.800003 102.050003
... ... ...
2020-08-13 208.699997 210.047998
2020-08-14 208.899994 209.931998
2020-08-17 210.279999 209.815997
2020-08-18 211.490005 209.699997
2020-08-19 209.699997 209.699997
Where the missing points (except the first entry) will be linearly put between. This can be done for months as well, but we need to be more careful because of three things. First, some dates (1st of the month) do not exist in the data DataFrame. To solve that we use an outer-join, which will include them. Second, this introduces some extra dates, which are not trading dates. Hence, we need to delete them afterwards, which we can do by deleting the column (drop) and removing rows with NA value (dropna). Thirdly, we also need to understand that the monthly view looks backwards. Hence, the 1st of January is first finalized the last day of January. Therefore we shift it back in the join.
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')
data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()
Step 3: Visualize the output and take a look at it
To visualize it is straight forward by using matplotlib.
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd
ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')
data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data.plot()
plt.show()
Which results in the following graph.
As expected the monthly price is adjusted to be the closing day-price the day before. Hence, it looks like the monthly-curve is crossing the day-curve on the 1st every month (which is almost true).
To really appreciate the Multiple Time Frames Analysis, it is better to keep the graphs separate and interpret them each isolated.
Step 4: How to use these different Multiple Time Frame Analysis
Given the picture it is a good idea to start top down. First look at the monthly picture, which shows the overall trend.
Month view of MFST.
In the case of MSFT it is a clear growing trend, with the exception of two declines. But the overall impression is a company in growth that does not seem to slow down. Even the Dow theory (see this tutorial on it) suggest that there will be secondary movements in a general bull trend.
Secondly, we will look at the weekly view.
Weekly view of MFST
Here your impression is a bit more volatile. It shows many smaller ups and downs, with a big one in March, 2020. It could also indicate a small decline in the growth right and the end. Also, the Dow theory could suggest that it will turn. Though it is not certain.
Finally, on the daily view it gives a more volatile picture, which can be used to when to enter the market.
Day view of MFST
Here you could also be a bit worried. Is this the start of a smaller bull market.
To sum up. In the month-view, we have concluded a growth. The week-view shows signs of possible change. Finally, the day-view is also showing signs of possible decline.
As an investor, and based on the above, I would not enter the market right now. If both the month-view and week-view showed growth, while the day-view decline, that would be a good indicator. You want the top level to show growth, while a day-view might show a small decline.
Finally, remember that you should not just use one way to interpret to enter the market or not.
Step 5: Is monthly the same as a Simple Moving Average?
Good question, I am glad you asked. The Simple Moving Average (SMA) can be calculated easy with DataFrames using rolling and mean function.
Best way is to just try it.
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd
ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')
data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()
data.plot()
plt.show()
As you see, the SMA is not as reactive on the in crisis in March, 2020, as the monthly view is. This shows a difference in them. This does not exclude the one from the other, but shows a difference in how they react.
Comparing the month-view with a Simple Moving Average of a month (20 trade days)
Please remember, that the monthly view is first updated at the end of a month, while SMA is updated on a daily basis.
Other differences is that SMA is an average of the 20 last days, while the monthly is the actual value of the last day of a month (as we look at Close). This implies that the monthly view can be much more volatile than the SMA.
Conclusion
It is advised to make analysis from bigger time frames and zoom in. This way you first look at overall trends, and get a bigger picture of the market. This should eliminate not to fall into being focused on a small detail in the market, but understand it on a higher level.
Dow theory was proposed by Charles H. Dow and is not an exact science. It is more how to identify trends in the market. In this tutorial we investigate the approach by testing it on data. Notice, that there are various ways to interpret it and often it is done by visual approximations, while we in this tutorial will make some rough assumptions to see if it beats the buy-and-hold approach of a stock.
First we will make our assumption on how to implement the Dow theory approach to make buy and sell indicators, which we will use as buy and sell markers in the market.
Step 1: Understand the Dow theory to make buy and sell indicators
The essence of Dow theory is that there are 3 types of trend in the market. The primary trend is a year or more long trend, like a bull market. Then on a secondary trend, the market can move in opposite direction for 3 weeks to 3 months. This can result in a pullback, that can seem like a bear market within the bull market. Finally, there are micro trends (less than 3 weeks) which can be considered as noise.
According to Dow theory each market has 3 phases. Our objective as an investor is to identify when a bear market turns into bull market.
Some visual example to understand the above will help a bit. A general bull market with primary and secondary trends could look like this.
Primary bull market trend with secondary bear market trends.
Where you should notice that the temporary lows are all increasing along the way.
A similar picture for a bear market could be.
Primary bear market trend with secondary bull market trends.
Here you should notice how the secondary bull markets peaks are also in a decreasing trend.
Step 2: Identify when a primary market trend changes
The key here is to identify when a primary stock trend goes from bull to bear or opposite.
Please also notice that Dow theory talks about the market and we here are looking at a stock. Hence, we have an assumption that the market and the stock have a strong enough correlation to use the same theory.
From a primary bear to a primary bull market could look like as follows.
From bear to bull market
We have added some markers in the diagram.
LL : Low-Low – meaning that the low is lower than previous low.
LH : Low-High – meaning that the high is lower than previous high.
HH : High-High – meaning that the high is higher than previous high.
HL : High-Low – meaning that the low is higher than previous low.
As you see, the bear market consists of consecutive LL and LH, while a bull market consists of consecutive HH and LH. The market changes from bear to bull when we confidently can say that we will get a HH, which we can do when we cross from the last LL over the last LH (before we reach HH).
Hence, a buy signal can be set when we reach a stock price above last LH.
Similar we can investigate the when a primary trends goes from bull to hear market.
From bull to a bear trend.
Where we have the same types of markers.
We see that the trend changes from bull to bear when we go from HL to LL. Hence, a sell indicator is when we are sure we reach a LL (that is before it is a LL).
Again, this is not an exact science and is just a way to interpret it. We will try it out on real stock data to see how it performs.
Step 3: Get some data and calculate points of lows and highs
import pandas_datareader as pdr
import datetime as dt
ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())
print(ticker)
Resulting in a time series for Twitter, which has the ticker TWTR. You can find other tickers for other companies by using the Yahoo! Finance ticker lookup.
First thing we need to get is to find the low and highs. First challenge here is that the stock price is going up and down during the day. To simplify our investigation we will only use the Close price.
Taking that decision might limit and not give correct results, but it surely simplifies our work.
Next up, we need to identify highs and lows. This can be done to see when a daily difference goes from positive to negative.
Where we have output above. The True values are when we reach Highs or Lows.
Now we have identified all the potential HH, LH, LH, and LL.
Step 4: Implement a simple trial of sell and buy
We continue our example on Twitter and see how we can perform.
Our strategy will be as follows.
We either have bought stocks for all our money or not. That is, either we have stocks or not.
If we do not have stocks, we buy if stock price is above last high, meaning that a HH is coming.
If we do have stocks, we sell if stock price is below last low, meaning that a LL is coming.
This can mean that we enter market in the last of a bull market. If you were to follow the theory complete, it suggest to wait until a bear market changes to a bull market.
import pandas_datareader as pdr
import datetime as dt
ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())
ticker['delta'] = ticker['Close'].diff()
growth = ticker['delta'] > 0
ticker['markers'] = growth.diff().shift(-1)
# We want to remember the last_high and last_low
# Set to max value not to trigger false buy
last_high = ticker['Close'].max()
last_low = 0.0
# Then setup our account, we can only have stocks or not
# We have a start balance of 100000 $
has_stock = False
balance = 100000
stocks = 0
for index, row in ticker.iterrows():
# Buy and sell orders
if not has_stock and row['Close'] > last_high:
has_stock = True
stocks = balance//row['Close']
balance -= row['Close']*stocks
elif has_stock and row['Close'] < last_low:
has_stock = False
balance += row['Close']*stocks
stocks = 0
# Update the last_high and last_low
if row['markers']:
if row['delta'] > 0:
last_high = row['Close']
else:
last_low = row['Close']
print("Dow returns", balance + stocks*ticker['Close'].iloc[-1])
# Compare this with a simple buy and hold approach.
buy_hold_stocks = 100000//ticker['Close'].iloc[0]
buy_hold = 100000 - buy_hold_stocks*ticker['Close'].iloc[0] + buy_hold_stocks*ticker['Close'].iloc[-1]
print("Buy-and-hold return", buy_hold)
That looks promising, but it might be just out of luck. Hence, we want to validate with other examples. The results say a return of investment of 20.3% using our Dow theory approach, while a simple buy-and-hold strategy gave 17.7%. This is over the span of less than 8 months.
The thing you would like to achieve with a strategy is to avoid big losses and not loose out on revenue. The above testing does not justify any clarification on that.
Step 5: Try out some other tickers to test it
A first investigation is to check how the algorithm performs on other stocks. We make one small adjustment, as the comparison to buy on day-1, might be quite unfair. If price is low, it an advantage, while if the price is high, it is a big disadvantage. The code below runs on multiple stocks and compare the first buy with a Dow approach (as outlined in this tutorial) with a buy-and-hold approach. The exit of the market might also be unfair.
import pandas_datareader as pdr
import datetime as dt
def dow_vs_hold_and_buy(ticker_name):
ticker = pdr.get_data_yahoo(ticker_name, dt.datetime(2020,1,1), dt.datetime.now())
ticker['delta'] = ticker['Close'].diff()
growth = ticker['delta'] > 0
ticker['markers'] = growth.diff().shift(-1)
# We want to remember the last_high and last_low
# Set to max value not to trigger false buy
last_high = ticker['Close'].max()
last_low = 0.0
# Then setup our account, we can only have stocks or not
# We have a start balance of 100000 $
has_stock = False
balance = 100000
stocks = 0
first_buy = None
for index, row in ticker.iterrows():
# Buy and sell orders
if not has_stock and row['Close'] > last_high:
has_stock = True
stocks = balance//row['Close']
balance -= row['Close']*stocks
if first_buy is None:
first_buy = index
elif has_stock and row['Close'] < last_low:
has_stock = False
balance += row['Close']*stocks
stocks = 0
# Update the last_high and last_low
if row['markers']:
if row['delta'] > 0:
last_high = row['Close']
else:
last_low = row['Close']
dow_returns = balance + stocks*ticker['Close'].iloc[-1]
# Compare this wiith a simple buy and hold approach.
buy_hold_stocks = 100000//ticker['Close'].loc[first_buy]
buy_hold_returns = 100000 - buy_hold_stocks*ticker['Close'].loc[first_buy] + buy_hold_stocks*ticker['Close'].iloc[-1]
print(ticker_name, dow_returns > buy_hold_returns, round(dow_returns/1000 - 100, 1), round(buy_hold_returns/1000 - 100, 1))
tickers = ["TWTR", "AAPL", "TSLA", "BAC", "KO", "GM", "MSFT", "AMZN", "GOOG", "FB", "INTC", "T"]
for ticker in tickers:
dow_vs_hold_and_buy(ticker)
This paints a different picture. First, it seems more random if it outperforms the buy-and-hold approach.
The one performing best is the General Motors Company (GM), but it might be due to unlucky entering of the market. The stock was high in the beginning of the year, and then fell a lot. Hence, here the Dow helped to exit and enter the market correct.
Intel Corporation (INTC) is working a lot against us. While there is a big loss (-18.4%), it is not saved by our Dow theory algorithm. There was a big loss in stock value 24th of July with 20% from close the day before to open. The Dow cannot save you for situations like that and will sell on the far bottom.
The Apple (AAPL) is also missing a lot of gain. The stock is in a great growth in 2020, with some challenges in March and after (Corona hit). But looking and buy and sell signals, it hits sell higher than the following buy and losses out on gain.
Amazon (AMZN) seems to be the same story. Growth in general and hitting buying on higher than previous sell, and loosing out on profit.
Next steps and considerations
We have made some broad simplifications in our algorithm.
Only consider Close value, while a normal way to find the markers are on a OHLC candlestick diagram.
If we used the span of the day price, then we might limit our losses with a stop-loss order earlier.
This is not an exact science, and the trends might need a different way to identify them.
Hence, the above suggest it can be more adjusted to real life.
Another thing to keep in mind is that you should never make your investment decision on only one indicator or algorithm choice.
The Relative Strength Index (RSI) on a stock is a technical indicator.
The relative strength index (RSI) is a momentum indicator used in technical analysis that measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock or other asset.
A technical indicator is a mathematical calculation based on past prices and volumes of a stock. The RSI has a value between 0 and 100. It is said to be overbought if above 70, and oversold if below 30.
To be quite honest, I found the description on investopedia.org a bit confusing. Therefore I went for the Wikipedia description of it. It is done is a couple of steps, so let us do the same.
If previous price is lower than current price, then set the values.
U = close_now – close_previous
D = 0
While if the previous price is higher than current price, then set the values
U = 0
D = close_previous – close_now
Calculate the Smoothed or modified moving average (SMMA) or the exponential moving average (EMA) of D and U. To be aligned with the Yahoo! Finance, I have chosen to use the (EMA).
Calculate the relative strength (RS)
RS = EMA(U)/EMA(D)
Then we end with the final calculation of the Relative Strength Index (RSI).
RSI = 100 – (100 / (1 – RSI))
Notice that the U are the price difference if positive otherwise 0, while D is the absolute value of the the price difference if negative.
Step 2: Get a stock and calculate the RSI
We will use the Pandas-datareader to get some time series data of a stock. If you are new to using Pandas-datareader we advice you to read this tutorial.
In this tutorial we will use Twitter as an examples, which has the TWTR ticker. It you want to do it on some other stock, then you can look up the ticker on Yahoo! Finance here.
Then below we have the following calculations.
import pandas_datareader as pdr
import datetime as dt
ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2020,1,1), dt.datetime.now())
delta = ticker['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
print(ticker)
To have a naming that is close to the definition and also aligned with Python, we use up for U and down for D.
This tutorial was written 2020-08-18, and comparing with the RSI for twitter on Yahoo! Finance.
From Yahoo! Finance on Twitter with RSI
As you can see in the lower left corner, the RSI for the same ending day was 62.50, which fits the calculated value. Further checks reveal that they also fit the values of Yahoo.
Step 3: Visualize the RSI with the daily stock price
We will use the matplotlib library to visualize the RSI with the stock price. In this tutorial we will have two rows of graphs by using the subplots function. The function returns an array of axis (along with a figure, which we will not use).
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
ticker = pdr.get_data_yahoo("TWTR", dt.datetime(2019,1,1), dt.datetime.now())
delta = ticker['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
ticker['RSI'] = 100 - (100/(1 + rs))
# Skip first 14 days to have real values
ticker = ticker.iloc[14:]
print(ticker)
fig, (ax1, ax2) = plt.subplots(2)
ax1.get_xaxis().set_visible(False)
fig.suptitle('Twitter')
ticker['Close'].plot(ax=ax1)
ax1.set_ylabel('Price ($)')
ticker['RSI'].plot(ax=ax2)
ax2.set_ylim(0,100)
ax2.axhline(30, color='r', linestyle='--')
ax2.axhline(70, color='r', linestyle='--')
ax2.set_ylabel('RSI')
plt.show()
Also, we we remove the x-axis of the first graph (ax1). Adjust the y-axis of the second graph (ax2). Also, we have set two horizontal lines to indicate overbought and oversold at 70 and 30, respectively. Notice, that Yahoo! Finance use 80 and 20 as indicators by default.
It is straight forward to achieve by using the new matplotlib finance API. The data can be collected by using Pandas-datareader with the open Yahoo! Finance API.
import pandas_datareader as pdr
import datetime as dt
import mplfinance as mpf
df = pdr.get_data_yahoo("AAPL", dt.datetime(2020,6,1), dt.datetime.now())
mpf.plot(df, type='candle', style='charles',
title='Apple',
ylabel='Price',
ylabel_lower='Volume',
volume=True,
mav=(1,3,6))
The mav-argument is the Moving Averages, I have also included the 1, which is the actual price.
Often when you see financial advisors have statements with awesome returns. These returns might be what is called Annual Average Growth Rates (AAGR). Why should you be skeptical with AAGR?
Simple example will show you.
You start by investing 10.000$.
First year you get 100% in return, resulting in 20.000$.
The year after you have a fall of 50%, which makes your value back to 10.000$
Using AAGR, your investor will tell you you have (100% – 50%)/2 = 25% AAGR or calls it average annual return.
But wait a minute? You have the same amount of money after two years, so how can that be 25%?
With Compound Annual Growth Rate the story is different as it only considers the start and end value. Here the difference is a big 0$, resulting in a 0% CAGR.
The formula for calculating CAGR is.
((end value)/(start value))^(1/years) – 1
As the above example: (10.000/10.000)^(1/2) – 1 = 0
Step 1: Getting access to financial sector data
In this tutorial we will use the Alpha Vantage. To connect to them you need to register to get a API_KEY.
Where you will select Software Developer in the drop-down Which of the following best describes you? Write your organization of choice. Then write your email address and click that you are not a robot. Or are you?
Then it will give you hare API_KEY on the screen (not in a email). The key is probably a 16 upper case character and integer string.
Step 2: Get the sector data to play with
Looking at Pandas-datareaders API you will see you can use the get_sector_performance_av() function.
import pandas_datareader.data as web
API_KEY = "INSERT YOUR KEY HERE"
data = web.get_sector_performance_av(api_key=API_KEY)
print(data)
Remember to change API_KEY to the key you got from Step 1.
You should get an output similar to this one (not showing all columns).
The columns we are interested in are the 1Y, 3Y, 5Y, and 10Y.
Step 3: Convert columns to floats
As you saw in the previous Step that the columns all contain in %-sign, which tells you that the entries are strings and not floats and need to be converted.
This can be done by some string magic. First we need to remove the %-sign before we convert it to a float.
import pandas_datareader.data as web
API_KEY = "INSERT YOUR KEY HERE"
data = web.get_sector_performance_av(api_key=API_KEY)
for column in data.columns:
data[column] = data[column].str.rstrip('%').astype('float') / 100.0
print(data[['1Y', '3Y', '5Y' , '10Y']])
Where we convert all columns in the for-loop. Then we print only the columns we need.
1Y 3Y 5Y 10Y
Communication Services 0.1999 0.2404 0.2992 0.7478
Information Technology 0.4757 1.0445 1.8351 4.8733
Consumer Discretionary 0.2904 0.6606 0.9237 3.8471
Materials 0.1051 0.1750 0.3764 1.0690
Health Care 0.1908 0.3721 0.4320 2.6858
Consumer Staples 0.0858 0.1596 0.2765 1.3766
Utilities 0.0034 0.1339 0.3479 0.9963
Financials -0.0566 0.0167 0.2389 1.1946
Industrials 0.0413 0.1257 0.4005 1.5556
Real Estate -0.0658 0.1251 NaN NaN
Energy -0.3383 -0.3945 -0.4469 -0.2907
All looking nice. Also, notice that we converted them to float values and not in %-values by dividing by 100.