Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Backtesting a Trading Strategy with Pandas and Python

    What will we cover?

    In this tutorial we will cover.

    • What is a backtesting strategy?
    • How to measure the performance of a backtesting strategy?
    • How to implement a backtesting strategy with Pandas?

    What is a backtesting strategy?

    In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

    The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

    Step 1: Read data from Yahoo! Finance API with Pandas Datareader

    Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

    import pandas as pd
    import pandas_datareader as pdr
    import datetime as dt
    import numpy as np
    start = dt.datetime(2010, 1, 1)
    data = pdr.get_data_yahoo("AAPL", start)

    Which will read data for the Apple ticker (AAPL) since 2010.

    Below is shown the head of data.

                    High       Low      Open     Close       Volume  Adj Close
    2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
    2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
    2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
    2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
    2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136

    Step 2: Calculate signals for a simple strategy

    The simple strategy we will use is moving average of period 5 and 20.

    When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

    This can be calculated as follows.

    data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
    data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

    This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

                      High         Low        Open  ...   Adj Close    Signal  Position
    Date                                            ...                                
    2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
    2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
    2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
    2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
    2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0

    The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

    Step 3: Remove unnecessary data columns and rows

    To have a cleaner dataset we will clean it up.

    data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)

    Where drop removes columns not needed and dropna removes rows with NaN.

    The inplace=True is simply to apply it on the DataFrame.

                Adj Close    Signal  Position
    2010-02-01   5.990476 -0.217986       0.0
    2010-02-02   6.025239 -0.252087       0.0
    2010-02-03   6.128909 -0.282004       0.0
    2010-02-04   5.908031 -0.297447       0.0
    2010-02-05   6.012933 -0.253271       0.0

    Step 4: Calculate the return of the strategy

    To calculate the return we will use log returns as we will see is an advantage.

    Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

    data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
    data['Return'] = data['Position'].shift(1)*data['Log return']

    This result in the following.

                 Adj Close    Signal  Position  Log return  Return
    2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
    2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
    2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
    2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
    2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0

    Now the additive advance of log returns comes in handy.

    Remember that that we can add up log returns to calculate the final return. For details I refer to this.

    Hence, we get that the return can be calculated as follows.

    data[['Log return', 'Return']].cumsum().apply(np.exp)

    Resulting in the following.

                Log return   Return
    2021-02-26   20.242133  7.29214
    2021-03-01   21.332196  7.29214
    2021-03-02   20.886489  7.29214
    2021-03-03   20.375677  7.29214
    2021-03-04   20.053499  7.29214

    Using a bit calculations.

    np.exp(data[['Log return', 'Return']].mean()*252)

    We get.

    Log return    1.310917
    Return        1.196485
    dtype: float64

    Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

    The natural question is: What did we gain with our strategy?

    Step 5: Evaluating our strategy

    If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

    data[['Log return', 'Return']].std()*252**.5

    Which gives the annualized standard deviation.

    Log return    0.283467
    Return        0.188044
    dtype: float64

    Hence, the gain from our strategy is a less volatile strategy.

    12% Investment Solution

    Would you like to get 12% in return of your investments?

    D. A. Carter promises and shows how his simple investment strategy will deliver that in the book The 12% Solution. The book shows how to test this statement by using backtesting.

    Did Carter find a strategy that will consistently beat the market?

    Actually, it is not that hard to use Python to validate his calculations. But we can do better than that. If you want to work smarter than traditional investors then continue to read here.

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    17 thoughts on “Backtesting a Trading Strategy with Pandas and Python”

    1. Hey Rune, thanks for your great work! Have a question about log returns and the „additive advantage“. I don‘t quite get it: Couldn‘t I use returns and cumprod() instead of log returns and cumsum(), and get the same effect? Thx!

      • Hey Volker.
        Sure it is the same. The difference is that sum is faster than product.
        If the difference in these problems is big I didn’t try. But it is custom to use sum for the above reason.

    2. Dear Rune. Thanks you for this excelent work. I have a doubt with this sentence

      data[‘Position’] = (data[‘Signal’].apply(np.sign) + 1)/2

      I undestand that the fucntion np.sign(), return the next:

      np.sign(x) -1 if signal >= 0, for so much “data[‘Position’]” will be (-1+1)/2 = 0
      np.sign(x) 0 if signal == 0 , for so much “data[‘Position’]” will be (0+1)/2 = 0.5 ¿?
      np.sign(x) 1 if signal <= 0, for so much "data['Position']"will be (1+1)/2 = 1

      If we are looking for data['Position'] 0 y 1, please explain this sentence to us. Cordial greetings

      • Let’s try to break it down.

        data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()

        In signal we have the rolling average of 5 days minus rolling average of 20 days.

        We want to use that as a signal.

        data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

        Let’s break that down too.


        This gives the sign of the signal: -1 if negative, 1 if positive (0 if 0).

        + 1

        Says, if -1 we get 0 and 2 if 1.

        Then /2

        Ensure we have 0 if short and 1 if long.

        That is what we try to do in step 2.

        This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

    3. Dear Rune
      I keep trying to understand these sentences. In this case, I do not understand :

      data[[‘Log return’, ‘Return’]].cumsum().apply(np.exp)
      np.exp(data[[‘Log return’, ‘Return’]].mean()*252)

      ¿The last row of the DataFrame data[[‘Log return’, ‘Return’]].cumsum()

      Isn’t the final result of the purchase/sales strategy and the strategy of maintaining the position?. ¿For what is necessary to apply the np.exp() function.

      I need to understand it to be able to trust the result. Thanks in advance. cordial greetings

      • Let’s try to break it down.

        data['Return'] = data['Position'].shift(1)*data['Log return']

        Keeps the return of our investment. It keeps the log-return for each day we have invested. The shift is because we need one day to react. We decide after closing hours to hold, buy, or sell. If we hold – we have a 1 in our position. Then we get the log return. otherwise, we have 0 and get nothing (0).

        data[[‘Log return’, ‘Return’]].cumsum().apply(np.exp)
        This then shows what we get if we had the investment all the time (log return) vs our investment strategy (return).
        The cumsum() is summing up all log-returns (in both log-return and return) and the we apply np.exp to get the actual return. This is the beauty of calculating with log returns.

        The mean()*252 is just used to get an annualized return.

        Please feel free to ask any clarifying questions.


    4. Thanks Rune

      In yours book “Python for Financial Analisys”, this question is is perfectly explained.

      In the chapter “Log returns ” we can see 𝑎×𝑏=exp(𝑙𝑜𝑔(𝑎×𝑏))=exp(log(𝑎)+log(𝑏))

    5. Dear Rune
      I keep learning with your script. I have another question that I will thank you for clarifying.

      In the statement “data [[‘Log return’, ‘Return’]]. Cumsum (). Apply (np.exp)”, you apply the function np.exp () to the two columns.

      I understand that we may use: “cumsum” with “return” and “cumprod” with “log return”, and that “np.exp()”, It is only applicable to logarithms .

      Can you confirm that your sentence is correct and what is my mistake? I appreciate your help in advance.

      • I think you are right.
        Let us see.
        The log returns are used to make it possible to add returns together.
        The np.exp is to transform log-returns to %-returns.
        Hence the cumsum().apply(np.exp) is adding log-returns and transforms them to %-return.
        If you had %-returns it would be:
        cumprod() directly. But this is slower and the standard in finance is to use log-returns in these cases.
        Hope it helps,

    6. Dear Rune:

      Computers, Artificial Intelligence, Machine Learning with its algorithms, geopolitics, etc., have so complicated the financial system, which sometimes can’t understand today.

      For this reason, to the people who cannot trust in a long way for their investments, the classic strategies do not serve. In my opinion, it is necessary to look for stock portfolio management strategies that allow us to preserve it in the worst circumstance that exists and achieve a benefit that even surpasses the cost of life index.

      As I learn this illusive language, I am committed to searching for some strategy with this objective. I would like to count with your authorization to use, adapting them, some sentences of this script.

      cordial greetings

      • I believe you are right. It is very difficult to make a bullet proof strategy. In this tutorial we only focus on backtesting in a classical way.
        I am interested to add ML to the strategy in new ways and test it.

    7. Dear Rune.
      In the statement ‘data[‘Position’] = (data[‘Signal’].apply(np.sign) + 1)/2′, I see the following problem. When ‘signal’ is equal to 0, this statement will return 0.5 in ‘position’.
      To make these 0.5 values 0, I have done the following.

      # Order number of records with 0.5 in position
      lista = np.where(data[‘position’] == 0.5)
      # Change 0.5 to 0
      for each_value in lista:
      data.loc[each_value, ‘position’] = 0

      What do you think about this? Am I wrong?


    Leave a Comment