Calucate MACD with Pandas DataFrames

What will we cover?

In this tutorial we will calculate and visualize the MACD for a stock price.

Step 1: Retrieve stock prices into a DataFrame (Pandas)

Let’s get started. You can get the CSV file from here or get your own from Yahoo! Finance.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib notebook
 
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

Step 2: Calculate the MACD indicator with Pandas DataFrame

First we want to calcite the MACD.

The calculation (12-26-9 MACD (default)) is defined as follows.

  • MACD=12-Period EMA − 26-Period EMA
  • Singal line 9-Perioed EMA of MACD

Where EMA is the Exponential Moving Average we learned about in the last lesson.

exp1 = data['Close'].ewm(span=12, adjust=False).mean()exp2 = data['Close'].ewm(span=26, adjust=False).mean()data['MACD'] = exp1 - exp2data['Signal line'] = data['MACD'].ewm(span=9, adjust=False).mean()

Now that was simple, right?

Step 3: Visualize the MACD with matplotlib

To visualize it you can use the following with Matplotlib.

fig, ax = plt.subplots()
data[['MACD', 'Signal line']].plot(ax=ax)
data['Close'].plot(ax=ax, alpha=0.25, secondary_y=True)

Resulting in an output similar to this one.

Next Steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

DataFrame Calculations for Financial Analysis

What will we cover?

In this lesson we will learn how to add new columns calculated from values in other columns in our DataFrame. This is similar to calculate in Excel on data from different columns.

Then we will demonstrate some useful function when working with financial data.

Finally, we will show how to remove (or drop) columns from our DataFrame.

Step 1: Load the data.

As usual we need to load the data into our DataFrame. You can get the CSV file from here.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

It is always a good habit to inspect the data with data.head() (see the video lesson or the in the Notebook link below the video for expected output).

Step 2: Create new columns in a DataFrame (Pandas)

To create a new column in our data set simply write as follows.

data['Daily chg'] = data['Close'] - data['Open']

The above statement will create a new column named Daily chg with the difference between column Close and Open.

Similarly, you can create a column with the normalized data as follows.

data['Normalized'] = data['Close'] / data['Close'].iloc[0]

This is how easy it is to work with.

Step 3: Get min and max in DataFrame columns

To find the minimum of a column.

data['Close'].min()

This will find the minimal value of the column Close.

To find the index of the minimum value use the following.

data['Close'].argmin()

You can do similar things as the following shows.

data['Normalized'].min()
data['Normalized'].argmin()
data['Close'].max()
data['Close'].argmax()

Step 4: Get the mean value of a column in a DataFrame (Pandas)

To get the mean value of a column, simply use mean().

data['Close'].mean()

Step 5: Remove / Delete columns in a DataFrame

It is always good practice to remove the columns of data we do not intend to use anymore. This can be done by using drop().

data.drop(labels=['High', 'Low', 'Adj Close', 'Volume'], axis=1, inplace=True)

Where we use the following arguments.

  • labels=[‘High’, ‘Low’, ‘Adj Close’, ‘Volume’] sets the labels of the columns we want to remove.
  • axis=1 sets the axis of the labels. Default is 0, and will look for the labels on the index. While axis 1 is the column names.
  • inplace=True says it should actually remove the columns on the DataFrame we work on. Otherwise it will return a new DataFrame without the columns.

What is next?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

DataFrame Columns and Series for Financial Analysis

What will we cover?

In the first lesson we learnt how to load data into a DataFrame. This part will show how to work with each column in the DataFrame. The columns are represented by a different data type, called Series.

n this lesson we will learn how to make calculations on the columns. The columns are represented by a data type called Series.

Each column in a DataFrame is a Series and can be easily accessed. Also, it is easy to calculate new Series of data. This is similar to calculate now columns of data in an Excel sheet.

We will explore that and more in this lesson.

Step 1: Load the data

We will start by importing the data (CSV file available here).

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

Step 2: Explore the data and data type

In the video we explore the data to ensure it is correct. You can do that by using data.head().

Then we investigate the data type of the columns of the DataFrame data.

data.dtypes

Which results in the following.

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

This means shows that each column has one data type. Here Open is float64. This is one difference from Excel sheets, where each cell has a data type. The advantage of restricting a data type per column is speed.

The data type of data is DataFrame.

type(data)

The build in function type(…) gives you the type. It is handy to use it when exploring data.

pandas.core.frame.DataFrame

Notice that it is given by a long string pandas.core.frame.DataFrame, this is the structure of the library Pandas.

The data type of a column in a DataFrame can be found by.

type(data['Close'])

Where data[‘Close’] gives access to column Close in the DataFramedata.

pandas.core.series.Series

Where we see a column is represented as a Series. The is similar to a DataFrame that it has an index. E.g. the Series data[‘Close’] has the same index as the DataFrame data. This is handy when you need to work with the data as you will see in a moment.

Step 3: Calculating with Series

To keep it simple, we will start by the daily difference from open and close.

daily_chg = data['Open'] - data['Close']

This calculates a Series daily_chg with the opening price minus the closing price. 

Please explore the full data in daily_chg with the data in data.

A more advanced calculation is this one.

daily_pct_chg = (data['Close'] - data['Open'])/data['Open']*100

Where we calculate the daily percentage change. In the calculation above we have limited us to only use data on the same rows (same dates). Later we will learn how to do it with data from previous day (the row above).

Step 4: Normalize stock data

Now we will normalize the data by using the iloc we learned about in previous lesson.

norm = data['Close']/data['Close'].iloc[0]

The above statements calculates a Series norm where the Close price is normalized by dividing by the first available Close price, accessed by using iloc[0].

This results in that norm.iloc[0] will be 1.0000 and norm.iloc[-1] we show the return of this particular stock if invested in on day 1 (index 0) and sold on the day of the last index (index -1), in the case of the video: 1.839521.

Next step?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Pandas for Financial Stock Analysis

What will we cover?

In this tutorial we will get familiar to work with DataFrames – the primary data structure in Pandas.

We will learn how to read a historical stock price data from Yahoo! Finance and load it into a DataFrame. This will be done by exporting a CSV file from Yahoo! Finance and load the data. Later we will learn how to read the data directly from the Yahoo! Finance API.

DataFrame is similar to an Excel sheet. DataFrames can contain data in a similar way as we will see in this lesson.

Then we will learn how to use the index of the dates. This will be necessary later when we make calculations later on.

The first part of the tutorial will give the foundation of what you need to know about DataFrames for financial analysis.

Step 1: Read the stock prices from Yahoo! Finance as CSV

In this first lesson we will download historical stock prices from Yahoo! Finance as CSV file and import them into our Jupyter notebook environment in a DataFrame.

If you are new to CSV files and DataFrames. Don’t worry, that is what we will cover here.

Let’s start by going to Yahoo! Finance and download the CVS file. In this course we have used Apple, but feel free to make similar calculation on a stock of your choice.

Go to Yahoo! Finance write AAPL (ticker for Apple) and press Historical Data and download the CSV data file.

The CSV data file will contain Comma Separated Values (CSV) similar to this.

Date,Open,High,Low,Close,Adj Close,Volume
2020-03-02,70.570000,75.360001,69.430000,74.702499,74.127892,341397200
2020-03-03,75.917503,76.000000,71.449997,72.330002,71.773636,319475600
2020-03-04,74.110001,75.849998,73.282501,75.684998,75.102829,219178400
2020-03-05,73.879997,74.887497,72.852501,73.230003,72.666725,187572800
2020-03-06,70.500000,72.705002,70.307503,72.257500,71.701706,226176800

The first line shows the column names (Date, Open, High, Low, Close, Adj Close, Volume). Then each line contains a data entry for a given day.

Step 2: Read the stock prices from CSV to Pandas DataFrame

n Jupyter Notebook start by importing the Pandas library. This is needed in order to load the data into a DataFrame.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
data.head()

The read_csv(…) does all the magic for us. It will read the CSV file AAPL.csv. The AAPL.csv file is the one you downloaded from Yahoo! Finance (or from the zip-file downloaded above) and needs to be located in the same folder you are working from in your Jupyter notebook.

The arguments in read_csv(…) are the following.

  • index_col=0 this sets the first column of the CSV file to be the index. In this case, it is the Date column.
  • parse_dates=True this ensures that dates in the CSV file are interpreted as dates. This is important if you want to take advantage of the index being a time.

Step 3: Explore data types of columns and index

In the video lesson we explore the type of columns and index.

data.dtypes
data.index

Which will reveal the data type and index of the DataFrame. Notice, that each column has its own data type.

Step 4: Indexing and slicing with DataFrames

We can use loc to lookup an index with a date.

data.loc['2020-01-27']

This will show the data for that specific date. If you get an error it might be because your dataset does not contain the above date. Choose another one to see something similar to this.

Open         7.751500e+01
High         7.794250e+01
Low          7.622000e+01
Close        7.723750e+01
Adj Close    7.657619e+01
Volume       1.619400e+08
Name: 2020-01-27 00:00:00, dtype: float64

A more advanced option is to use an interval (or slice as it is called). Slicing with loc on a DataFrame is done by using a starting and ending index .loc[start:end] or an open ended index .loc[start:], which will take data beginning from start to the last data.

data.loc['2021-01-01':]

This will give all the data starting from 2020-01-01. Notice, that there is no data on January 1st, but since the index is interpreted as a datetime, it can figure out the first date after.

            Open        High        Low         Close    Adj Close       Volume
Date                        
2021-01-04  133.520004  133.610001  126.760002  129.410004  129.410004  143301900
2021-01-05  128.889999  131.740005  128.429993  131.009995  131.009995  97664900
2021-01-06  127.720001  131.050003  126.379997  126.599998  126.599998  155088000
2021-01-07  128.360001  131.630005  127.860001  130.919998  130.919998  109578200
2021-01-08  132.429993  132.630005  130.229996  132.050003  132.050003  105158200
2021-01-11  129.190002  130.169998  128.500000  128.979996  128.979996  100620900

Similarly, you can create slicing with an open-ended start.

data.loc[:'2020-07-01']

Another important way to index into DataFrames is by iloc[], which does it with index.

data.iloc[0]
data.iloc[-1]

Where you can index from the start with index 0, 1, 2, 3, … Or from the end -1, -2, -3, -4, …

What is next?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Calculate the Volatility of Historic Stock Prices with Pandas and Python

What will we cover in this tutorial?

We will calculate the volatility of historic stock prices with Python library Pandas.

Step 1: Read Historic Stock Prices with Pandas Datareader

We will use Pandas Datareader to read some historic stock prices. See this tutorial for details.

import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data.head())

Resulting in this.

                 High        Low       Open      Close       Volume  Adj Close
Date                                                                          
2019-01-02  39.712502  38.557499  38.722500  39.480000  148158800.0  38.505024
2019-01-03  36.430000  35.500000  35.994999  35.547501  365248800.0  34.669640
2019-01-04  37.137501  35.950001  36.132500  37.064999  234428400.0  36.149662
2019-01-07  37.207500  36.474998  37.174999  36.982498  219111200.0  36.069202
2019-01-08  37.955002  37.130001  37.389999  37.687500  164101200.0  36.756794

Step 2: Calculate the Volatility of an Asset

Let’s explore the difference between daily simple returns and daily log returns. Shortly explained, the log returns have the advantage that you can add them together, while this is not the case for simple returns. Therefore the log returns are used in most financial analysis.

To calculate the daily log returns we need the NumPy library. For the purpose here, we will not explore the depths of NumPy, all we need is to apply the log-function on a full column in our DataFrame (see my other FREE course for more details on NumPy).

import numpy as np

data['Log returns'] = np.log(data['Close']/data['Close'].shift())

This creates a column called Log returns with the daily log return of the Close price.

We need the standard deviation for the volatility of the stock.

This can be calculated from our Log returns as follows.

data['Log returns'].std()

The above gives the daily standard deviation. The volatility is defined as the annualized standard deviation. Using the above formula we can calculate it as follows.

volatility = data['Log returns'].std()*252**.5

Notice that square root is the same as **.5, which is the power of 1/2.

Step 3: Visualize the Volatility of Historic Stock Prices

This can be visualized with Matplotlib.

str_vol = str(round(volatility, 4)*100)

fig, ax = plt.subplots()
data[‘Log returns’].hist(ax=ax, bins=50, alpha=0.6, color=’b’)
ax.set_xlabel(“Log return”)
ax.set_ylabel(“Freq of log return”)
ax.set_title(“AAPL volatility: ” + str_vol + “%”)

Resulting in the following output.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Simple and Exponential Moving Average with Python and Pandas

What will we cover in this tutorial?

In this tutorial we will cover how to calculate the Simple Moving Average (MA) and the Exponential Moving Average (EMA) of a Time Series using the Pandas library in Python.

Step 1: Read some Financial Historic Time Series Stock Prices

We will use Pandas Datareader to read some historic stock prices. See this tutorial for details.

import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data.head())

Resulting in this.

                 High        Low       Open      Close       Volume  Adj Close
Date                                                                          
2019-01-02  39.712502  38.557499  38.722500  39.480000  148158800.0  38.505024
2019-01-03  36.430000  35.500000  35.994999  35.547501  365248800.0  34.669640
2019-01-04  37.137501  35.950001  36.132500  37.064999  234428400.0  36.149662
2019-01-07  37.207500  36.474998  37.174999  36.982498  219111200.0  36.069202
2019-01-08  37.955002  37.130001  37.389999  37.687500  164101200.0  36.756794

Step 2: Calculate the Simple Moving Average with Python and Pandas

To calculate the Simple Moving Average (MA) of the data can be done using the rolling and mean methods.

data['MA10'] = data['Close'].rolling(10).mean()

Where here we calculate the Simple Moving Average of 10 days. You can change it to fit your needs.

Step 3: Calculate the Exponential Moving Average with Python and Pandas

It is a bit more involved to calculate the Exponential Moving Average.

data['EMA10'] = data['Close'].ewm(span=10, adjust=False).mean()

There you need to set the span and adjust to False. This is needed to get the same numbers as on Yahoo! Finance.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Read Historical Prices from Yahoo! Finance with Python

What will we cover in this tutorial?

In this tutorial we will cover the following.

  • How to use Pandas Datareader to read historical stock prices from Yahoo! Finance.
  • Learn how to read weekly and monthly data.
  • Also how to read multiple tickers at once.

Step 1: What is Pandas Datareader?

Pandas-Datareader is an up to date remote data access for pandas.

This leads to the next question. What is pandas?

Pandas is a data analysis and manipulation tool containing a great data structure for the purpose.

Shortly said, pandas can be thought of as a data structure in Python, which is similar to working with data in a spreadsheet.

Pandas-datareader reads data from various sources and puts the data into a pandas data structures.

Pandas-datareader has a call to return historic stock price data from Yahoo! Finance.

To use Pandas-datareader you need to import the library.

Step 2: Example reading data from Yahoo! Finance with Pandas-Datareader

Let’s break the following example down.

import pandas_datareader as pdr
import datetime as dt
 
ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
 
data = pdr.get_data_yahoo(ticker, start, end)

print(data)

Where we first import two libraries.

  • pandas_datareader The Pandas Datareader. If you do not have it installed already in your Jupyter Notebook you can do that by entering this in a cell !pip install pandas_datareader and execute it.
  • datetime This is a default library and represents a date and time. We only use it for the date aspects.

The the following lines.

  • ticker = “AAPL” The ticker we want data from. You can use any ticker you want. In this course we have used the ticker for Apple (AAPL).
  • start = dt.datetime(2019, 1, 1) Is the starting day we want historic stock price data.
  • end = dt.datetime(2020, 12, 31) The end day.
  • data = pdr.get_data_yahoo(ticker, start, end) This is the magic that uses Pandas Datareader (pdr) to get data from the Yahoo! Finance API. It returns a DataFrame as we know it from previous lessons.

The output of the code is as follows.

                  High         Low  ...       Volume   Adj Close
Date                                ...                         
2019-01-02   39.712502   38.557499  ...  148158800.0   38.505024
2019-01-03   36.430000   35.500000  ...  365248800.0   34.669640
2019-01-04   37.137501   35.950001  ...  234428400.0   36.149662
2019-01-07   37.207500   36.474998  ...  219111200.0   36.069202
2019-01-08   37.955002   37.130001  ...  164101200.0   36.756794
...                ...         ...  ...          ...         ...
2020-12-24  133.460007  131.100006  ...   54930100.0  131.773087
2020-12-28  137.339996  133.509995  ...  124486200.0  136.486053
2020-12-29  138.789993  134.339996  ...  121047300.0  134.668762
2020-12-30  135.990005  133.399994  ...   96452100.0  133.520477
2020-12-31  134.740005  131.720001  ...   99116600.0  132.492020

[505 rows x 6 columns]

Step 3: A few parameters to set

You can get multiple tickers at once by parsing a list of them.

import pandas_datareader as pdr
import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data)

You can get the weekly or monthly data by using the argument as follows.

import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end, interval='w')

print(data)

Set interval=’m’ to get monthly data instead of weekly with ‘w’.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Matplotlib Visualization for DataFrame Time Series Data

What will we cover in this tutorial?

We will learn how to visualization time series data in a DataFrame with Matplotlib.

This tutorial will show you.

  • How to use Matplotlib with DataFrames.
  • Use Matplotlib with subplots (the object-oriented way).
  • How to make multiple plots in one figure.
  • How to create bar-plots

Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

import pandas as pd

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

remote_file = "https://raw.githubusercontent.com/LearnPythonWithRune/FinancialDataAnalysisWithPython/main/AAPL.csv"
data = pd.read_csv(remote_file, index_col=0, parse_dates=True)

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

import matplotlib.pyplot as plt
%matplotlib notebook

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

fig, ax = plt.subplots()
data['Close'].plot(ax=ax)
ax.set_ylabel("Price")
ax.set_title("AAPL")

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Luckily, this is quite easy.

fig, ax = plt.subplots(2, 2)
data['Open'].plot(ax=ax[0, 0], title="Open")
data['High'].plot(ax=ax[0, 1], title="High")
data['Low'].plot(ax=ax[1, 0], title="Low")
data['Close'].plot(ax=ax[1, 1], title="Close")
plt.tight_layout()

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

Actually, we will render a horizontal bar-chart.

fig, ax = plt.subplots()
data['Volume'].loc['2020-07-01':'2020-08-15'].plot.barh(ax=ax)

We do it for the volume and only on a limited interval of time. This shows you how to take advantage of the time series aspect of the DataFrame.

Next step

The above is part of the FREE 2h Video course.

Backtesting a Trading Strategy with Pandas and Python

What will we cover?

In this tutorial we will cover.

  • What is a backtesting strategy?
  • How to measure the performance of a backtesting strategy?
  • How to implement a backtesting strategy with Pandas?

What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np

start = dt.datetime(2010, 1, 1)

data = pdr.get_data_yahoo("AAPL", start)

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

                High       Low      Open     Close       Volume  Adj Close
Date                                                                      
2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136

Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

                  High         Low        Open  ...   Adj Close    Signal  Position
Date                                            ...                                
2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)
data.dropna(inplace=True)

Where drop removes columns not needed and dropna removes rows with NaN.

The inplace=True is simply to apply it on the DataFrame.

            Adj Close    Signal  Position
Date                                     
2010-02-01   5.990476 -0.217986       0.0
2010-02-02   6.025239 -0.252087       0.0
2010-02-03   6.128909 -0.282004       0.0
2010-02-04   5.908031 -0.297447       0.0
2010-02-05   6.012933 -0.253271       0.0

Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']

This result in the following.

             Adj Close    Signal  Position  Log return  Return
Date                                                          
2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0

Now the additive advance of log returns comes in handy.

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

data[['Log return', 'Return']].cumsum().apply(np.exp)

Resulting in the following.

            Log return   Return
Date                           
2021-02-26   20.242133  7.29214
2021-03-01   21.332196  7.29214
2021-03-02   20.886489  7.29214
2021-03-03   20.375677  7.29214
2021-03-04   20.053499  7.29214

Using a bit calculations.

np.exp(data[['Log return', 'Return']].mean()*252)

We get.

Log return    1.310917
Return        1.196485
dtype: float64

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

data[['Log return', 'Return']].std()*252**.5

Which gives the annualized standard deviation.

Log return    0.283467
Return        0.188044
dtype: float64

Hence, the gain from our strategy is a less volatile strategy.

Calculate the Average True Range (ATR) Easy with Pandas DataFrames

What will we cover in this tutorial?

In this tutorial we will cover the following

Step 1: Read historic stock prices from Yahoo! Finance API

To read data from Yahoo! Finance API we use Pandas-Datareader, which has a direct method.

This requires that we give a start date on how old data we want to retrieve.

import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

print(data.tail())

This we result in similar output.

	High	Low	Open	Close	Volume	Adj Close
Date						
2021-02-12	561.250000	550.849976	556.940002	556.520020	2195900	556.520020
2021-02-16	563.630005	552.729980	557.289978	557.280029	2622400	557.280029
2021-02-17	555.250000	543.030029	550.989990	551.340027	2069600	551.340027
2021-02-18	550.000000	538.229980	549.000000	548.219971	2456200	548.219971
2021-02-19	548.989990	538.809998	548.000000	540.219971	2838600	540.219971

Calculate the Average True Range (ATR)

The Average True Range (ATR) is calculated as follows, as investopedia.org defines it.

This can be calculated as follows.

import numpy as np
import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

high_low = data['High'] - data['Low']
high_close = np.abs(data['High'] - data['Close'].shift())
low_close = np.abs(data['Low'] - data['Close'].shift())

ranges = pd.concat([high_low, high_close, low_close], axis=1)
true_range = np.max(ranges, axis=1)

atr = true_range.rolling(14).sum()/14

Where we use the 14 days standard.

Visualize the ATR and the stock price

We will use Matplotlib to visualize it as it integrates well with DataFrames from Pandas.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
atr.plot(ax=ax)
data['Close'].plot(ax=ax, secondary_y=True, alpha=0.3)
plt.show()

This will result in a chart similar to this one.

Want to learn more?

Check out this full FREE course learning about Risk and Return with Pandas DataFrames (Python).