Matplotlib Visualization for DataFrame Time Series Data

What will we cover in this tutorial?

We will learn how to visualization time series data in a DataFrame with Matplotlib.

This tutorial will show you.

  • How to use Matplotlib with DataFrames.
  • Use Matplotlib with subplots (the object-oriented way).
  • How to make multiple plots in one figure.
  • How to create bar-plots

Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

import pandas as pd

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

remote_file = ""
data = pd.read_csv(remote_file, index_col=0, parse_dates=True)

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

import matplotlib.pyplot as plt
%matplotlib notebook

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

fig, ax = plt.subplots()

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Luckily, this is quite easy.

fig, ax = plt.subplots(2, 2)
data['Open'].plot(ax=ax[0, 0], title="Open")
data['High'].plot(ax=ax[0, 1], title="High")
data['Low'].plot(ax=ax[1, 0], title="Low")
data['Close'].plot(ax=ax[1, 1], title="Close")

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

Actually, we will render a horizontal bar-chart.

fig, ax = plt.subplots()

We do it for the volume and only on a limited interval of time. This shows you how to take advantage of the time series aspect of the DataFrame.

Next step

The above is part of the FREE 2h Video course.

Backtesting a Trading Strategy with Pandas and Python

What will we cover?

In this tutorial we will cover.

  • What is a backtesting strategy?
  • How to measure the performance of a backtesting strategy?
  • How to implement a backtesting strategy with Pandas?

What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np
start = dt.datetime(2010, 1, 1)
data = pdr.get_data_yahoo("AAPL", start)

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

                High       Low      Open     Close       Volume  Adj Close
2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136

Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

                  High         Low        Open  ...   Adj Close    Signal  Position
Date                                            ...                                
2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)

Where drop removes columns not needed and dropna removes rows with NaN.

The inplace=True is simply to apply it on the DataFrame.

            Adj Close    Signal  Position
2010-02-01   5.990476 -0.217986       0.0
2010-02-02   6.025239 -0.252087       0.0
2010-02-03   6.128909 -0.282004       0.0
2010-02-04   5.908031 -0.297447       0.0
2010-02-05   6.012933 -0.253271       0.0

Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']

This result in the following.

             Adj Close    Signal  Position  Log return  Return
2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0

Now the additive advance of log returns comes in handy.

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

data[['Log return', 'Return']].cumsum().apply(np.exp)

Resulting in the following.

            Log return   Return
2021-02-26   20.242133  7.29214
2021-03-01   21.332196  7.29214
2021-03-02   20.886489  7.29214
2021-03-03   20.375677  7.29214
2021-03-04   20.053499  7.29214

Using a bit calculations.

np.exp(data[['Log return', 'Return']].mean()*252)

We get.

Log return    1.310917
Return        1.196485
dtype: float64

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

data[['Log return', 'Return']].std()*252**.5

Which gives the annualized standard deviation.

Log return    0.283467
Return        0.188044
dtype: float64

Hence, the gain from our strategy is a less volatile strategy.