DataFrame Columns and Series for Financial Analysis

What will we cover?

In the first lesson we learnt how to load data into a DataFrame. This part will show how to work with each column in the DataFrame. The columns are represented by a different data type, called Series.

n this lesson we will learn how to make calculations on the columns. The columns are represented by a data type called Series.

Each column in a DataFrame is a Series and can be easily accessed. Also, it is easy to calculate new Series of data. This is similar to calculate now columns of data in an Excel sheet.

We will explore that and more in this lesson.

Step 1: Load the data

We will start by importing the data (CSV file available here).

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

Step 2: Explore the data and data type

In the video we explore the data to ensure it is correct. You can do that by using data.head().

Then we investigate the data type of the columns of the DataFrame data.

data.dtypes

Which results in the following.

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

This means shows that each column has one data type. Here Open is float64. This is one difference from Excel sheets, where each cell has a data type. The advantage of restricting a data type per column is speed.

The data type of data is DataFrame.

type(data)

The build in function type(…) gives you the type. It is handy to use it when exploring data.

pandas.core.frame.DataFrame

Notice that it is given by a long string pandas.core.frame.DataFrame, this is the structure of the library Pandas.

The data type of a column in a DataFrame can be found by.

type(data['Close'])

Where data[‘Close’] gives access to column Close in the DataFramedata.

pandas.core.series.Series

Where we see a column is represented as a Series. The is similar to a DataFrame that it has an index. E.g. the Series data[‘Close’] has the same index as the DataFrame data. This is handy when you need to work with the data as you will see in a moment.

Step 3: Calculating with Series

To keep it simple, we will start by the daily difference from open and close.

daily_chg = data['Open'] - data['Close']

This calculates a Series daily_chg with the opening price minus the closing price. 

Please explore the full data in daily_chg with the data in data.

A more advanced calculation is this one.

daily_pct_chg = (data['Close'] - data['Open'])/data['Open']*100

Where we calculate the daily percentage change. In the calculation above we have limited us to only use data on the same rows (same dates). Later we will learn how to do it with data from previous day (the row above).

Step 4: Normalize stock data

Now we will normalize the data by using the iloc we learned about in previous lesson.

norm = data['Close']/data['Close'].iloc[0]

The above statements calculates a Series norm where the Close price is normalized by dividing by the first available Close price, accessed by using iloc[0].

This results in that norm.iloc[0] will be 1.0000 and norm.iloc[-1] we show the return of this particular stock if invested in on day 1 (index 0) and sold on the day of the last index (index -1), in the case of the video: 1.839521.

Next step?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Pandas for Financial Stock Analysis

What will we cover?

In this tutorial we will get familiar to work with DataFrames – the primary data structure in Pandas.

We will learn how to read a historical stock price data from Yahoo! Finance and load it into a DataFrame. This will be done by exporting a CSV file from Yahoo! Finance and load the data. Later we will learn how to read the data directly from the Yahoo! Finance API.

DataFrame is similar to an Excel sheet. DataFrames can contain data in a similar way as we will see in this lesson.

Then we will learn how to use the index of the dates. This will be necessary later when we make calculations later on.

The first part of the tutorial will give the foundation of what you need to know about DataFrames for financial analysis.

Step 1: Read the stock prices from Yahoo! Finance as CSV

In this first lesson we will download historical stock prices from Yahoo! Finance as CSV file and import them into our Jupyter notebook environment in a DataFrame.

If you are new to CSV files and DataFrames. Don’t worry, that is what we will cover here.

Let’s start by going to Yahoo! Finance and download the CVS file. In this course we have used Apple, but feel free to make similar calculation on a stock of your choice.

Go to Yahoo! Finance write AAPL (ticker for Apple) and press Historical Data and download the CSV data file.

The CSV data file will contain Comma Separated Values (CSV) similar to this.

Date,Open,High,Low,Close,Adj Close,Volume
2020-03-02,70.570000,75.360001,69.430000,74.702499,74.127892,341397200
2020-03-03,75.917503,76.000000,71.449997,72.330002,71.773636,319475600
2020-03-04,74.110001,75.849998,73.282501,75.684998,75.102829,219178400
2020-03-05,73.879997,74.887497,72.852501,73.230003,72.666725,187572800
2020-03-06,70.500000,72.705002,70.307503,72.257500,71.701706,226176800

The first line shows the column names (Date, Open, High, Low, Close, Adj Close, Volume). Then each line contains a data entry for a given day.

Step 2: Read the stock prices from CSV to Pandas DataFrame

n Jupyter Notebook start by importing the Pandas library. This is needed in order to load the data into a DataFrame.

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
data.head()

The read_csv(…) does all the magic for us. It will read the CSV file AAPL.csv. The AAPL.csv file is the one you downloaded from Yahoo! Finance (or from the zip-file downloaded above) and needs to be located in the same folder you are working from in your Jupyter notebook.

The arguments in read_csv(…) are the following.

  • index_col=0 this sets the first column of the CSV file to be the index. In this case, it is the Date column.
  • parse_dates=True this ensures that dates in the CSV file are interpreted as dates. This is important if you want to take advantage of the index being a time.

Step 3: Explore data types of columns and index

In the video lesson we explore the type of columns and index.

data.dtypes
data.index

Which will reveal the data type and index of the DataFrame. Notice, that each column has its own data type.

Step 4: Indexing and slicing with DataFrames

We can use loc to lookup an index with a date.

data.loc['2020-01-27']

This will show the data for that specific date. If you get an error it might be because your dataset does not contain the above date. Choose another one to see something similar to this.

Open         7.751500e+01
High         7.794250e+01
Low          7.622000e+01
Close        7.723750e+01
Adj Close    7.657619e+01
Volume       1.619400e+08
Name: 2020-01-27 00:00:00, dtype: float64

A more advanced option is to use an interval (or slice as it is called). Slicing with loc on a DataFrame is done by using a starting and ending index .loc[start:end] or an open ended index .loc[start:], which will take data beginning from start to the last data.

data.loc['2021-01-01':]

This will give all the data starting from 2020-01-01. Notice, that there is no data on January 1st, but since the index is interpreted as a datetime, it can figure out the first date after.

            Open        High        Low         Close    Adj Close       Volume
Date                        
2021-01-04  133.520004  133.610001  126.760002  129.410004  129.410004  143301900
2021-01-05  128.889999  131.740005  128.429993  131.009995  131.009995  97664900
2021-01-06  127.720001  131.050003  126.379997  126.599998  126.599998  155088000
2021-01-07  128.360001  131.630005  127.860001  130.919998  130.919998  109578200
2021-01-08  132.429993  132.630005  130.229996  132.050003  132.050003  105158200
2021-01-11  129.190002  130.169998  128.500000  128.979996  128.979996  100620900

Similarly, you can create slicing with an open-ended start.

data.loc[:'2020-07-01']

Another important way to index into DataFrames is by iloc[], which does it with index.

data.iloc[0]
data.iloc[-1]

Where you can index from the start with index 0, 1, 2, 3, … Or from the end -1, -2, -3, -4, …

What is next?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Calculate the Volatility of Historic Stock Prices with Pandas and Python

What will we cover in this tutorial?

We will calculate the volatility of historic stock prices with Python library Pandas.

Step 1: Read Historic Stock Prices with Pandas Datareader

We will use Pandas Datareader to read some historic stock prices. See this tutorial for details.

import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data.head())

Resulting in this.

                 High        Low       Open      Close       Volume  Adj Close
Date                                                                          
2019-01-02  39.712502  38.557499  38.722500  39.480000  148158800.0  38.505024
2019-01-03  36.430000  35.500000  35.994999  35.547501  365248800.0  34.669640
2019-01-04  37.137501  35.950001  36.132500  37.064999  234428400.0  36.149662
2019-01-07  37.207500  36.474998  37.174999  36.982498  219111200.0  36.069202
2019-01-08  37.955002  37.130001  37.389999  37.687500  164101200.0  36.756794

Step 2: Calculate the Volatility of an Asset

Let’s explore the difference between daily simple returns and daily log returns. Shortly explained, the log returns have the advantage that you can add them together, while this is not the case for simple returns. Therefore the log returns are used in most financial analysis.

To calculate the daily log returns we need the NumPy library. For the purpose here, we will not explore the depths of NumPy, all we need is to apply the log-function on a full column in our DataFrame (see my other FREE course for more details on NumPy).

import numpy as np

data['Log returns'] = np.log(data['Close']/data['Close'].shift())

This creates a column called Log returns with the daily log return of the Close price.

We need the standard deviation for the volatility of the stock.

This can be calculated from our Log returns as follows.

data['Log returns'].std()

The above gives the daily standard deviation. The volatility is defined as the annualized standard deviation. Using the above formula we can calculate it as follows.

volatility = data['Log returns'].std()*252**.5

Notice that square root is the same as **.5, which is the power of 1/2.

Step 3: Visualize the Volatility of Historic Stock Prices

This can be visualized with Matplotlib.

str_vol = str(round(volatility, 4)*100)

fig, ax = plt.subplots()
data[‘Log returns’].hist(ax=ax, bins=50, alpha=0.6, color=’b’)
ax.set_xlabel(“Log return”)
ax.set_ylabel(“Freq of log return”)
ax.set_title(“AAPL volatility: ” + str_vol + “%”)

Resulting in the following output.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Simple and Exponential Moving Average with Python and Pandas

What will we cover in this tutorial?

In this tutorial we will cover how to calculate the Simple Moving Average (MA) and the Exponential Moving Average (EMA) of a Time Series using the Pandas library in Python.

Step 1: Read some Financial Historic Time Series Stock Prices

We will use Pandas Datareader to read some historic stock prices. See this tutorial for details.

import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data.head())

Resulting in this.

                 High        Low       Open      Close       Volume  Adj Close
Date                                                                          
2019-01-02  39.712502  38.557499  38.722500  39.480000  148158800.0  38.505024
2019-01-03  36.430000  35.500000  35.994999  35.547501  365248800.0  34.669640
2019-01-04  37.137501  35.950001  36.132500  37.064999  234428400.0  36.149662
2019-01-07  37.207500  36.474998  37.174999  36.982498  219111200.0  36.069202
2019-01-08  37.955002  37.130001  37.389999  37.687500  164101200.0  36.756794

Step 2: Calculate the Simple Moving Average with Python and Pandas

To calculate the Simple Moving Average (MA) of the data can be done using the rolling and mean methods.

data['MA10'] = data['Close'].rolling(10).mean()

Where here we calculate the Simple Moving Average of 10 days. You can change it to fit your needs.

Step 3: Calculate the Exponential Moving Average with Python and Pandas

It is a bit more involved to calculate the Exponential Moving Average.

data['EMA10'] = data['Close'].ewm(span=10, adjust=False).mean()

There you need to set the span and adjust to False. This is needed to get the same numbers as on Yahoo! Finance.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Read Historical Prices from Yahoo! Finance with Python

What will we cover in this tutorial?

In this tutorial we will cover the following.

  • How to use Pandas Datareader to read historical stock prices from Yahoo! Finance.
  • Learn how to read weekly and monthly data.
  • Also how to read multiple tickers at once.

Step 1: What is Pandas Datareader?

Pandas-Datareader is an up to date remote data access for pandas.

This leads to the next question. What is pandas?

Pandas is a data analysis and manipulation tool containing a great data structure for the purpose.

Shortly said, pandas can be thought of as a data structure in Python, which is similar to working with data in a spreadsheet.

Pandas-datareader reads data from various sources and puts the data into a pandas data structures.

Pandas-datareader has a call to return historic stock price data from Yahoo! Finance.

To use Pandas-datareader you need to import the library.

Step 2: Example reading data from Yahoo! Finance with Pandas-Datareader

Let’s break the following example down.

import pandas_datareader as pdr
import datetime as dt
 
ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)
 
data = pdr.get_data_yahoo(ticker, start, end)

print(data)

Where we first import two libraries.

  • pandas_datareader The Pandas Datareader. If you do not have it installed already in your Jupyter Notebook you can do that by entering this in a cell !pip install pandas_datareader and execute it.
  • datetime This is a default library and represents a date and time. We only use it for the date aspects.

The the following lines.

  • ticker = “AAPL” The ticker we want data from. You can use any ticker you want. In this course we have used the ticker for Apple (AAPL).
  • start = dt.datetime(2019, 1, 1) Is the starting day we want historic stock price data.
  • end = dt.datetime(2020, 12, 31) The end day.
  • data = pdr.get_data_yahoo(ticker, start, end) This is the magic that uses Pandas Datareader (pdr) to get data from the Yahoo! Finance API. It returns a DataFrame as we know it from previous lessons.

The output of the code is as follows.

                  High         Low  ...       Volume   Adj Close
Date                                ...                         
2019-01-02   39.712502   38.557499  ...  148158800.0   38.505024
2019-01-03   36.430000   35.500000  ...  365248800.0   34.669640
2019-01-04   37.137501   35.950001  ...  234428400.0   36.149662
2019-01-07   37.207500   36.474998  ...  219111200.0   36.069202
2019-01-08   37.955002   37.130001  ...  164101200.0   36.756794
...                ...         ...  ...          ...         ...
2020-12-24  133.460007  131.100006  ...   54930100.0  131.773087
2020-12-28  137.339996  133.509995  ...  124486200.0  136.486053
2020-12-29  138.789993  134.339996  ...  121047300.0  134.668762
2020-12-30  135.990005  133.399994  ...   96452100.0  133.520477
2020-12-31  134.740005  131.720001  ...   99116600.0  132.492020

[505 rows x 6 columns]

Step 3: A few parameters to set

You can get multiple tickers at once by parsing a list of them.

import pandas_datareader as pdr
import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data)

You can get the weekly or monthly data by using the argument as follows.

import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end, interval='w')

print(data)

Set interval=’m’ to get monthly data instead of weekly with ‘w’.

Next steps?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Matplotlib Visualization for DataFrame Time Series Data

What will we cover in this tutorial?

We will learn how to visualization time series data in a DataFrame with Matplotlib.

This tutorial will show you.

  • How to use Matplotlib with DataFrames.
  • Use Matplotlib with subplots (the object-oriented way).
  • How to make multiple plots in one figure.
  • How to create bar-plots

Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

import pandas as pd

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

remote_file = "https://raw.githubusercontent.com/LearnPythonWithRune/FinancialDataAnalysisWithPython/main/AAPL.csv"
data = pd.read_csv(remote_file, index_col=0, parse_dates=True)

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

import matplotlib.pyplot as plt
%matplotlib notebook

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

fig, ax = plt.subplots()
data['Close'].plot(ax=ax)
ax.set_ylabel("Price")
ax.set_title("AAPL")

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Luckily, this is quite easy.

fig, ax = plt.subplots(2, 2)
data['Open'].plot(ax=ax[0, 0], title="Open")
data['High'].plot(ax=ax[0, 1], title="High")
data['Low'].plot(ax=ax[1, 0], title="Low")
data['Close'].plot(ax=ax[1, 1], title="Close")
plt.tight_layout()

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

Actually, we will render a horizontal bar-chart.

fig, ax = plt.subplots()
data['Volume'].loc['2020-07-01':'2020-08-15'].plot.barh(ax=ax)

We do it for the volume and only on a limited interval of time. This shows you how to take advantage of the time series aspect of the DataFrame.

Next step

The above is part of the FREE 2h Video course.

Backtesting a Trading Strategy with Pandas and Python

What will we cover?

In this tutorial we will cover.

  • What is a backtesting strategy?
  • How to measure the performance of a backtesting strategy?
  • How to implement a backtesting strategy with Pandas?

What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np

start = dt.datetime(2010, 1, 1)

data = pdr.get_data_yahoo("AAPL", start)

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

                High       Low      Open     Close       Volume  Adj Close
Date                                                                      
2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136

Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

                  High         Low        Open  ...   Adj Close    Signal  Position
Date                                            ...                                
2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)
data.dropna(inplace=True)

Where drop removes columns not needed and dropna removes rows with NaN.

The inplace=True is simply to apply it on the DataFrame.

            Adj Close    Signal  Position
Date                                     
2010-02-01   5.990476 -0.217986       0.0
2010-02-02   6.025239 -0.252087       0.0
2010-02-03   6.128909 -0.282004       0.0
2010-02-04   5.908031 -0.297447       0.0
2010-02-05   6.012933 -0.253271       0.0

Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']

This result in the following.

             Adj Close    Signal  Position  Log return  Return
Date                                                          
2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0

Now the additive advance of log returns comes in handy.

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

data[['Log return', 'Return']].cumsum().apply(np.exp)

Resulting in the following.

            Log return   Return
Date                           
2021-02-26   20.242133  7.29214
2021-03-01   21.332196  7.29214
2021-03-02   20.886489  7.29214
2021-03-03   20.375677  7.29214
2021-03-04   20.053499  7.29214

Using a bit calculations.

np.exp(data[['Log return', 'Return']].mean()*252)

We get.

Log return    1.310917
Return        1.196485
dtype: float64

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

data[['Log return', 'Return']].std()*252**.5

Which gives the annualized standard deviation.

Log return    0.283467
Return        0.188044
dtype: float64

Hence, the gain from our strategy is a less volatile strategy.

Calculate the Average True Range (ATR) Easy with Pandas DataFrames

What will we cover in this tutorial?

In this tutorial we will cover the following

Step 1: Read historic stock prices from Yahoo! Finance API

To read data from Yahoo! Finance API we use Pandas-Datareader, which has a direct method.

This requires that we give a start date on how old data we want to retrieve.

import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

print(data.tail())

This we result in similar output.

	High	Low	Open	Close	Volume	Adj Close
Date						
2021-02-12	561.250000	550.849976	556.940002	556.520020	2195900	556.520020
2021-02-16	563.630005	552.729980	557.289978	557.280029	2622400	557.280029
2021-02-17	555.250000	543.030029	550.989990	551.340027	2069600	551.340027
2021-02-18	550.000000	538.229980	549.000000	548.219971	2456200	548.219971
2021-02-19	548.989990	538.809998	548.000000	540.219971	2838600	540.219971

Calculate the Average True Range (ATR)

The Average True Range (ATR) is calculated as follows, as investopedia.org defines it.

This can be calculated as follows.

import numpy as np
import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

high_low = data['High'] - data['Low']
high_close = np.abs(data['High'] - data['Close'].shift())
low_close = np.abs(data['Low'] - data['Close'].shift())

ranges = pd.concat([high_low, high_close, low_close], axis=1)
true_range = np.max(ranges, axis=1)

atr = true_range.rolling(14).sum()/14

Where we use the 14 days standard.

Visualize the ATR and the stock price

We will use Matplotlib to visualize it as it integrates well with DataFrames from Pandas.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
atr.plot(ax=ax)
data['Close'].plot(ax=ax, secondary_y=True, alpha=0.3)
plt.show()

This will result in a chart similar to this one.

Excel Automation with Simple Moving Average from Python

What will we cover in this tutorial?

We will retrieve the historic stock prices and calculate the moving average. Then we will export the data to Excel and insert a chart, but all done from Python.

See the in depth explanation in the YouTube video. It also gives advice on how to interpret the Simple Moving Averages (SMA).

Step 1: Read historic stock prices

We will use the Pandas-datarader to get the historic prices of NFLX (the ticker for Netflix).

import pandas_datareader as pdr
import datetime as dt

ticker = "NFLX"
start = dt.datetime(2019, 1, 1)

data = pdr.get_data_yahoo(ticker, start)
print(data.head())

And you will get the historic data for Netflix from January 1st, 2019.

	High	Low	Open	Close	Volume	Adj Close
Date						
2019-01-02	269.750000	256.579987	259.279999	267.660004	11679500	267.660004
2019-01-03	275.790009	264.429993	270.200012	271.200012	14969600	271.200012
2019-01-04	297.799988	278.540009	281.880005	297.570007	19330100	297.570007
2019-01-07	316.799988	301.649994	302.100006	315.339996	18620100	315.339996
2019-01-08	320.589996	308.010010	319.980011	320.269989	15359200	320.269989

Step 2: Understand Moving Average

We will calculate the Simple Moving Average as defined on Investopedia.

Simple Moving Average

The Simple Moving Average (Now just referred to as Moving Average or MA) is defined by a period of days.

That is, the MA of a period of 10 (MA10) will take the average value of the last 10 close prices. This is done in a rolling way, hence, we will get a MA10 for every trading day in our historic data, except the first 9 days in our dataset.

We can similarly calculate a MA50 and MA200, which is a Moving Average of the last 50 and 200 days, respectively.

Step 3: Calculating the Moving Averages

We can do that by using rolling and mean.

And it is magic.

data['MA10'] = data['Close'].rolling(10).mean()
data['MA50'] = data['Close'].rolling(50).mean()
data['MA200'] = data['Close'].rolling(200).mean()

print(data.tail())

That was easy, right?

	High	Low	Open	Close	Volume	Adj Close	MA10	MA50	MA200
Date									
2021-01-12	501.089996	485.670013	500.000000	494.250000	5990400	494.250000	515.297998	502.918599	477.08175
2021-01-13	512.349976	493.010010	495.500000	507.790009	5032100	507.790009	512.989999	503.559600	477.76590
2021-01-14	514.500000	499.579987	507.350006	500.859985	4177400	500.859985	510.616995	503.894399	478.39270
2021-01-15	506.320007	495.100006	500.000000	497.980011	5890200	497.980011	506.341998	504.109600	479.06220
2021-01-19	509.250000	493.540009	501.000000	501.769989	11996900	501.769989	504.232999	504.205999	479.72065

Step 4: Visualize it with Matplotlib

We can see the data with Matplotlib.

import matplotlib.pyplot as plt

data[['Close', 'MA10', 'MA50']].loc['2020-01-01':].plot()
plt.show()

Resulting in the following plot.

The output

Where you can see how the MA10 and MA50 move according to the price.

Step 5: Export to Excel

Now we will export the data to Excel.

For this we need to import Pandas and use the XlsxWriter engine, where you can find the details of the code.

The code can be found here.

import pandas as pd

data = data.loc['2020-01-01':]
data = data.iloc[::-1]
writer = pd.ExcelWriter("technical.xlsx", 
                        engine='xlsxwriter', 
                        date_format = 'yyyy-mm-dd', 
                        datetime_format='yyyy-mm-dd')

sheet_name = 'Moving Average'
data[['Close', 'MA10', 'MA50']].to_excel(writer, sheet_name=sheet_name)


worksheet = writer.sheets[sheet_name]
workbook = writer.book

# Create a format for a green cell
green_cell = workbook.add_format({
    'bg_color': '#C6EFCE',
    'font_color': '#006100'
})

# Create a format for a red cell
red_cell = workbook.add_format({
    'bg_color': '#FFC7CE',                            
    'font_color': '#9C0006'
})


# Set column width of Date
worksheet.set_column(0, 0, 15)


for col in range(1, 4):
    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2>=D2',
        'format': green_cell
    })

    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2<D2',
        'format': red_cell
    })

# Create a new chart object.
chart1 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart1.add_series({
        'name': "MA10",
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 2, len(data), 2],
})

# Create a new chart object.
chart2 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart2.add_series({
        'name': 'MA50',
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 3, len(data), 3],
})

# Combine and insert title, axis names
chart1.combine(chart2)
chart1.set_title({'name': sheet_name + " " + ticker})
chart1.set_x_axis({'name': 'Date'})
chart1.set_y_axis({'name': 'Price'})

# Insert the chart into the worksheet.
worksheet.insert_chart('F2', chart1)

writer.close()

Where the output will be something similar to this.

Generated Excel sheet

How to Plot Time Series with Matplotlib

What will we cover in this tutorial?

In this tutorial we will show how to visualize time series with Matplotlib. We will do that using Jupyter notebook and you can download the resources (the notebook and data used) from here.

Step 1: What is a time series?

I am happy you asked.

The easiest way to understand it, is to show it. If you downloaded the resources and started the Jupyter notebook execute the following lines.

import pandas as pd

data = pd.read_csv("stock_data.csv", index_col=0, parse_dates=True)

data.head()

This will produce the following output.

	High	Low	Open	Close	Volume	Adj Close
Date						
2020-01-02	86.139999	84.342003	84.900002	86.052002	47660500.0	86.052002
2020-01-03	90.800003	87.384003	88.099998	88.601997	88892500.0	88.601997
2020-01-06	90.311996	88.000000	88.094002	90.307999	50665000.0	90.307999
2020-01-07	94.325996	90.671997	92.279999	93.811996	89410500.0	93.811996
2020-01-08	99.697998	93.646004	94.739998	98.428001	155721500.0	98.428001

You notice the the far left column is called Date and that is the index. This index has a time value, in this case, a date.

Time series data is data “stamped” by a time. In this case, it is time indexed by dates.

The data you see is historic stock prices.

Step 2: How to visualize data with Matplotlib

The above data is kept in a DataFrame (Pandas data object), this makes it straight forward to visualize it.

import matplotlib.pyplot as plt
%matplotlib notebook

data.plot()

Which will result in a chart similar to this one.

Result

This is not impressive. It seems like something is wrong.

Actually, there is not. It just does what you ask for. It plots all the 6 columns all together in one chart. Because the Volume is such a high number, all the other columns are in the same brown line (the one that looks straight).

Step 3: Matplotlib has a functional and object oriented interface

This is often a bit confusing at first.

But Matplotlib has a functional and object oriented interface. We used the functional.

If you try to execute the following in your Jupyter notebook.

data['My col'] = data['Volume']*0.5
data['My col'].plot()

It would seem like nothing happened.

But then investigate your previous plot.

Previous plot

It got updated with a new line. Hence, instead of creating a new chart (or figure) it just added it to the existing one.

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.

Step 4: How to make a new figure

What to do?

Well, you need to use the object oriented interface of Matplotlib.

You can do that as follows.

fig1, ax1 = plt.subplots()
data['My col'].plot(ax=ax1)

Which will produce what you are looking for. A new figure.

The new figure

Step 5: Make multiple plots in one figure

This is getting fun.

How can you create multiple plots in one figure?

On creating you actually do that.

fig2, ax2 = plt.subplots(2, 2)

data['Open'].plot(ax=ax2[0, 0])
data['High'].plot(ax=ax2[0, 1])
data['Low'].plot(ax=ax2[1, 0])
data['Close'].plot(ax=ax2[1, 1])
plt.tight_layout()

Notice that subplots(2, 2) creates a 2 times 2 array of axis you can use to create a plot.

This should result in this chart.

Result

Step 6: Make a histogram

This can be done as follows.

fig3, ax3 = plt.subplots()

data.loc[:'2020-01-31', 'Volume'].plot.bar(ax=ax3)

Notice that we only take the first month of the Volume data here (data.loc[:’2020-01-31′, ‘Volume’]).

This should result in this figure.

Step 7: Save the figures

This is straight forward.

fig1.savefig("figure-1.png")
fig2.savefig("figure-2.png")
fig3.savefig("figure-3.png")

And the above figures should be available in the same location you are running your Jupyter notebook.

Next step

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.