This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and JupyterNotebooks available on GitHub.

In this lesson we will learn how to add new columns calculated from values in other columns in our DataFrame. This is similar to calculate in Excel on data from different columns.

Then we will demonstrate some useful function when working with financial data.

Finally, we will show how to remove (or drop) columns from our DataFrame.

Step 1: Load the data.

As usual we need to load the data into our DataFrame. You can get the CSV file from here.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

It is always a good habit to inspect the data with data.head() (see the video lesson or the in the Notebook link below the video for expected output).

Step 2: Create new columns in a DataFrame (Pandas)

To create a new column in our data set simply write as follows.

data['Daily chg'] = data['Close'] - data['Open']

The above statement will create a new column named Daily chg with the difference between column Close and Open.

Similarly, you can create a column with the normalized data as follows.

labels=[‘High’, ‘Low’, ‘Adj Close’, ‘Volume’] sets the labels of the columns we want to remove.

axis=1 sets the axis of the labels. Default is 0, and will look for the labels on the index. While axis 1 is the column names.

inplace=True says it should actually remove the columns on the DataFrame we work on. Otherwise it will return a new DataFrame without the columns.

What is next?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and JupyterNotebooks available on GitHub.

In the first lesson we learnt how to load data into a DataFrame. This part will show how to work with each column in the DataFrame. The columns are represented by a different data type, called Series.

n this lesson we will learn how to make calculations on the columns. The columns are represented by a data type called Series.

Each column in a DataFrame is a Series and can be easily accessed. Also, it is easy to calculate new Series of data. This is similar to calculate now columns of data in an Excel sheet.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

Step 2: Explore the data and data type

In the video we explore the data to ensure it is correct. You can do that by using data.head().

Then we investigate the data type of the columns of the DataFramedata.

data.dtypes

Which results in the following.

Open float64
High float64
Low float64
Close float64
Adj Close float64
Volume int64
dtype: object

This means shows that each column has one data type. Here Open is float64. This is one difference from Excel sheets, where each cell has a data type. The advantage of restricting a data type per column is speed.

The data type of data is DataFrame.

type(data)

The build in function type(…) gives you the type. It is handy to use it when exploring data.

pandas.core.frame.DataFrame

Notice that it is given by a long string pandas.core.frame.DataFrame, this is the structure of the library Pandas.

The data type of a column in a DataFrame can be found by.

type(data['Close'])

Where data[‘Close’] gives access to column Close in the DataFramedata.

pandas.core.series.Series

Where we see a column is represented as a Series. The is similar to a DataFrame that it has an index. E.g. the Series data[‘Close’] has the same index as the DataFrame data. This is handy when you need to work with the data as you will see in a moment.

Step 3: Calculating with Series

To keep it simple, we will start by the daily difference from open and close.

daily_chg = data['Open'] - data['Close']

This calculates a Series daily_chg with the opening price minus the closing price.

Please explore the full data in daily_chg with the data in data.

Where we calculate the daily percentage change. In the calculation above we have limited us to only use data on the same rows (same dates). Later we will learn how to do it with data from previous day (the row above).

Step 4: Normalize stock data

Now we will normalize the data by using the iloc we learned about in previous lesson.

norm = data['Close']/data['Close'].iloc[0]

The above statements calculates a Series norm where the Close price is normalized by dividing by the first available Close price, accessed by using iloc[0].

This results in that norm.iloc[0] will be 1.0000 and norm.iloc[-1] we show the return of this particular stock if invested in on day 1 (index 0) and sold on the day of the last index (index -1), in the case of the video: 1.839521.

Next step?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and JupyterNotebooks available on GitHub.

In this tutorial we will get familiar to work with DataFrames – the primary data structure in Pandas.

We will learn how to read a historical stock price data from Yahoo! Finance and load it into a DataFrame. This will be done by exporting a CSV file from Yahoo! Finance and load the data. Later we will learn how to read the data directly from the Yahoo! Finance API.

A DataFrame is similar to an Excel sheet. DataFrames can contain data in a similar way as we will see in this lesson.

Then we will learn how to use the index of the dates. This will be necessary later when we make calculations later on.

The first part of the tutorial will give the foundation of what you need to know about DataFrames for financial analysis.

Step 1: Read the stock prices from Yahoo! Finance as CSV

In this first lesson we will download historical stock prices from Yahoo! Finance as CSV file and import them into our Jupyter notebook environment in a DataFrame.

If you are new to CSV files and DataFrames. Don’t worry, that is what we will cover here.

Let’s start by going to Yahoo! Finance and download the CVS file. In this course we have used Apple, but feel free to make similar calculation on a stock of your choice.

Go to Yahoo! Finance write AAPL (ticker for Apple) and press Historical Data and download the CSV data file.

The CSV data file will contain Comma Separated Values (CSV) similar to this.

The first line shows the column names (Date, Open, High, Low, Close, Adj Close, Volume). Then each line contains a data entry for a given day.

Step 2: Read the stock prices from CSV to Pandas DataFrame

n Jupyter Notebook start by importing the Pandas library. This is needed in order to load the data into a DataFrame.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
data.head()

The read_csv(…) does all the magic for us. It will read the CSV file AAPL.csv. The AAPL.csv file is the one you downloaded from Yahoo! Finance (or from the zip-file downloaded above) and needs to be located in the same folder you are working from in your Jupyter notebook.

The arguments in read_csv(…) are the following.

index_col=0 this sets the first column of the CSV file to be the index. In this case, it is the Date column.

parse_dates=True this ensures that dates in the CSV file are interpreted as dates. This is important if you want to take advantage of the index being a time.

Step 3: Explore data types of columns and index

In the video lesson we explore the type of columns and index.

data.dtypes
data.index

Which will reveal the data type and index of the DataFrame. Notice, that each column has its own data type.

Step 4: Indexing and slicing with DataFrames

We can use loc to lookup an index with a date.

data.loc['2020-01-27']

This will show the data for that specific date. If you get an error it might be because your dataset does not contain the above date. Choose another one to see something similar to this.

Open 7.751500e+01
High 7.794250e+01
Low 7.622000e+01
Close 7.723750e+01
Adj Close 7.657619e+01
Volume 1.619400e+08
Name: 2020-01-27 00:00:00, dtype: float64

A more advanced option is to use an interval (or slice as it is called). Slicing with loc on a DataFrame is done by using a starting and ending index .loc[start:end] or an open ended index .loc[start:], which will take data beginning from start to the last data.

data.loc['2021-01-01':]

This will give all the data starting from 2020-01-01. Notice, that there is no data on January 1st, but since the index is interpreted as a datetime, it can figure out the first date after.

Open High Low Close Adj Close Volume
Date
2021-01-04 133.520004 133.610001 126.760002 129.410004 129.410004 143301900
2021-01-05 128.889999 131.740005 128.429993 131.009995 131.009995 97664900
2021-01-06 127.720001 131.050003 126.379997 126.599998 126.599998 155088000
2021-01-07 128.360001 131.630005 127.860001 130.919998 130.919998 109578200
2021-01-08 132.429993 132.630005 130.229996 132.050003 132.050003 105158200
2021-01-11 129.190002 130.169998 128.500000 128.979996 128.979996 100620900

Similarly, you can create slicing with an open-ended start.

data.loc[:'2020-07-01']

Another important way to index into DataFrames is by iloc[], which does it with index.

data.iloc[0]
data.iloc[-1]

Where you can index from the start with index 0, 1, 2, 3, … Or from the end -1, -2, -3, -4, …

What is next?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and JupyterNotebooks available on GitHub.

import pandas_datareader as pdr
import datetime as dt
ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
data = pdr.get_data_yahoo(ticker, start, end)
print(data.head())

Resulting in this.

High Low Open Close Volume Adj Close
Date
2019-01-02 39.712502 38.557499 38.722500 39.480000 148158800.0 38.505024
2019-01-03 36.430000 35.500000 35.994999 35.547501 365248800.0 34.669640
2019-01-04 37.137501 35.950001 36.132500 37.064999 234428400.0 36.149662
2019-01-07 37.207500 36.474998 37.174999 36.982498 219111200.0 36.069202
2019-01-08 37.955002 37.130001 37.389999 37.687500 164101200.0 36.756794

Step 2: Calculate the Volatility of an Asset

Let’s explore the difference between daily simple returns and daily log returns. Shortly explained, the log returns have the advantage that you can add them together, while this is not the case for simple returns. Therefore the log returns are used in most financial analysis.

To calculate the daily log returns we need the NumPy library. For the purpose here, we will not explore the depths of NumPy, all we need is to apply the log-function on a full column in our DataFrame (see my other FREE course for more details on NumPy).

import numpy as np
data['Log returns'] = np.log(data['Close']/data['Close'].shift())

This creates a column called Log returns with the daily log return of the Close price.

We need the standard deviation for the volatility of the stock.

This can be calculated from our Log returns as follows.

data['Log returns'].std()

The above gives the daily standard deviation. The volatility is defined as the annualized standard deviation. Using the above formula we can calculate it as follows.

volatility = data['Log returns'].std()*252**.5

Notice that square root is the same as **.5, which is the power of 1/2.

Step 3: Visualize the Volatility of Historic Stock Prices

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and JupyterNotebooks available on GitHub.

There you need to set the span and adjust to False. This is needed to get the same numbers as on Yahoo! Finance.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Pandas is a data analysis and manipulation tool containing a great data structure for the purpose.

Shortly said, pandas can be thought of as a data structure in Python, which is similar to working with data in a spreadsheet.

Pandas-datareader reads data from various sources and puts the data into a pandas data structures.

Pandas-datareader has a call to return historic stock price data from Yahoo! Finance.

To use Pandas-datareader you need to import the library.

Step 2: Example reading data from Yahoo! Finance with Pandas-Datareader

Let’s break the following example down.

import pandas_datareader as pdr
import datetime as dt
ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
data = pdr.get_data_yahoo(ticker, start, end)
print(data)

Where we first import two libraries.

pandas_datareader The Pandas Datareader. If you do not have it installed already in your Jupyter Notebook you can do that by entering this in a cell !pip install pandas_datareader and execute it.

datetime This is a default library and represents a date and time. We only use it for the date aspects.

The the following lines.

ticker = “AAPL” The ticker we want data from. You can use any ticker you want. In this course we have used the ticker for Apple (AAPL).

start = dt.datetime(2019, 1, 1) Is the starting day we want historic stock price data.

end = dt.datetime(2020, 12, 31) The end day.

data = pdr.get_data_yahoo(ticker, start, end) This is the magic that uses Pandas Datareader (pdr) to get data from the Yahoo! Finance API. It returns a DataFrame as we know it from previous lessons.

You can get multiple tickers at once by parsing a list of them.

import pandas_datareader as pdr
import datetime as dt
ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
data = pdr.get_data_yahoo(ticker, start, end)
print(data)

You can get the weekly or monthly data by using the argument as follows.

import datetime as dt
ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
data = pdr.get_data_yahoo(ticker, start, end, interval='w')
print(data)

Set interval=’m’ to get monthly data instead of weekly with ‘w’.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Use Matplotlib with subplots (the object-oriented way).

How to make multiple plots in one figure.

How to create bar-plots

Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

import pandas as pd

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

remote_file = "https://raw.githubusercontent.com/LearnPythonWithRune/FinancialDataAnalysisWithPython/main/AAPL.csv"
data = pd.read_csv(remote_file, index_col=0, parse_dates=True)

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

import matplotlib.pyplot as plt
%matplotlib notebook

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

How to measure the performance of a backtesting strategy?

How to implement a backtesting strategy with Pandas?

What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np
start = dt.datetime(2010, 1, 1)
data = pdr.get_data_yahoo("AAPL", start)

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

High Low Open Close Volume Adj Close
Date
2010-01-04 7.660714 7.585000 7.622500 7.643214 493729600.0 6.583586
2010-01-05 7.699643 7.616071 7.664286 7.656429 601904800.0 6.594968
2010-01-06 7.686786 7.526786 7.656429 7.534643 552160000.0 6.490066
2010-01-07 7.571429 7.466071 7.562500 7.520714 477131200.0 6.478067
2010-01-08 7.571429 7.466429 7.510714 7.570714 447610800.0 6.521136

Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

High Low Open ... Adj Close Signal Position
Date ...
2021-02-26 124.849998 121.199997 122.589996 ... 121.260002 -7.610835 0.0
2021-03-01 127.930000 122.790001 123.750000 ... 127.790001 -7.054179 0.0
2021-03-02 128.720001 125.010002 128.410004 ... 125.120003 -6.761187 0.0
2021-03-03 125.709999 121.839996 124.809998 ... 122.059998 -6.782757 0.0
2021-03-04 123.599998 118.620003 121.750000 ... 120.129997 -6.274249 0.0

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.