## What will we cover?

In this tutorial we will cover.

- What is a
**backtesting strategy**? - How to measure the
**performance**of a backtesting strategy? - How to implement a backtesting strategy with
**Pandas**?

## What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with **return**, **volatility**, and **max drawdown**. Other metrics can also be used, but for this tutorial we will use these.

## Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from **Yahoo! Finance API** with **Pandas Datareader**.

```
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np
start = dt.datetime(2010, 1, 1)
data = pdr.get_data_yahoo("AAPL", start)
```

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

```
High Low Open Close Volume Adj Close
Date
2010-01-04 7.660714 7.585000 7.622500 7.643214 493729600.0 6.583586
2010-01-05 7.699643 7.616071 7.664286 7.656429 601904800.0 6.594968
2010-01-06 7.686786 7.526786 7.656429 7.534643 552160000.0 6.490066
2010-01-07 7.571429 7.466071 7.562500 7.520714 477131200.0 6.478067
2010-01-08 7.571429 7.466429 7.510714 7.570714 447610800.0 6.521136
```

## Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

```
data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2
```

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

```
High Low Open ... Adj Close Signal Position
Date ...
2021-02-26 124.849998 121.199997 122.589996 ... 121.260002 -7.610835 0.0
2021-03-01 127.930000 122.790001 123.750000 ... 127.790001 -7.054179 0.0
2021-03-02 128.720001 125.010002 128.410004 ... 125.120003 -6.761187 0.0
2021-03-03 125.709999 121.839996 124.809998 ... 122.059998 -6.782757 0.0
2021-03-04 123.599998 118.620003 121.750000 ... 120.129997 -6.274249 0.0
```

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

## Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

```
data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)
data.dropna(inplace=True)
```

Where **drop** removes columns not needed and **dropna** removes rows with **NaN**.

The **inplace=True** is simply to apply it on the DataFrame.

```
Adj Close Signal Position
Date
2010-02-01 5.990476 -0.217986 0.0
2010-02-02 6.025239 -0.252087 0.0
2010-02-03 6.128909 -0.282004 0.0
2010-02-04 5.908031 -0.297447 0.0
2010-02-05 6.012933 -0.253271 0.0
```

## Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

```
data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']
```

This result in the following.

```
Adj Close Signal Position Log return Return
Date
2021-02-26 121.260002 -7.610835 0.0 0.002229 0.0
2021-03-01 127.790001 -7.054179 0.0 0.052451 0.0
2021-03-02 125.120003 -6.761187 0.0 -0.021115 -0.0
2021-03-03 122.059998 -6.782757 0.0 -0.024761 -0.0
2021-03-04 120.129997 -6.274249 0.0 -0.015938 -0.0
```

Now the additive advance of log returns comes in handy.

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

```
data[['Log return', 'Return']].cumsum().apply(np.exp)
```

Resulting in the following.

```
Log return Return
Date
2021-02-26 20.242133 7.29214
2021-03-01 21.332196 7.29214
2021-03-02 20.886489 7.29214
2021-03-03 20.375677 7.29214
2021-03-04 20.053499 7.29214
```

Using a bit calculations.

```
np.exp(data[['Log return', 'Return']].mean()*252)
```

We get.

```
Log return 1.310917
Return 1.196485
dtype: float64
```

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

## Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to **Investpedia**.

```
data[['Log return', 'Return']].std()*252**.5
```

Which gives the annualized standard deviation.

```
Log return 0.283467
Return 0.188044
dtype: float64
```

Hence, the gain from our strategy is a less volatile strategy.

Hey Rune, thanks for your great work! Have a question about log returns and the „additive advantage“. I don‘t quite get it: Couldn‘t I use returns and cumprod() instead of log returns and cumsum(), and get the same effect? Thx!

Hey Volker.

Sure it is the same. The difference is that sum is faster than product.

If the difference in these problems is big I didn’t try. But it is custom to use sum for the above reason.