DataFrame Columns and Series for Financial Analysis

What will we cover?

In the first lesson we learnt how to load data into a DataFrame. This part will show how to work with each column in the DataFrame. The columns are represented by a different data type, called Series.

n this lesson we will learn how to make calculations on the columns. The columns are represented by a data type called Series.

Each column in a DataFrame is a Series and can be easily accessed. Also, it is easy to calculate new Series of data. This is similar to calculate now columns of data in an Excel sheet.

We will explore that and more in this lesson.

Step 1: Load the data

We will start by importing the data (CSV file available here).

import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)

Step 2: Explore the data and data type

In the video we explore the data to ensure it is correct. You can do that by using data.head().

Then we investigate the data type of the columns of the DataFrame data.

data.dtypes

Which results in the following.

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

This means shows that each column has one data type. Here Open is float64. This is one difference from Excel sheets, where each cell has a data type. The advantage of restricting a data type per column is speed.

The data type of data is DataFrame.

type(data)

The build in function type(…) gives you the type. It is handy to use it when exploring data.

pandas.core.frame.DataFrame

Notice that it is given by a long string pandas.core.frame.DataFrame, this is the structure of the library Pandas.

The data type of a column in a DataFrame can be found by.

type(data['Close'])

Where data[‘Close’] gives access to column Close in the DataFramedata.

pandas.core.series.Series

Where we see a column is represented as a Series. The is similar to a DataFrame that it has an index. E.g. the Series data[‘Close’] has the same index as the DataFrame data. This is handy when you need to work with the data as you will see in a moment.

Step 3: Calculating with Series

To keep it simple, we will start by the daily difference from open and close.

daily_chg = data['Open'] - data['Close']

This calculates a Series daily_chg with the opening price minus the closing price. 

Please explore the full data in daily_chg with the data in data.

A more advanced calculation is this one.

daily_pct_chg = (data['Close'] - data['Open'])/data['Open']*100

Where we calculate the daily percentage change. In the calculation above we have limited us to only use data on the same rows (same dates). Later we will learn how to do it with data from previous day (the row above).

Step 4: Normalize stock data

Now we will normalize the data by using the iloc we learned about in previous lesson.

norm = data['Close']/data['Close'].iloc[0]

The above statements calculates a Series norm where the Close price is normalized by dividing by the first available Close price, accessed by using iloc[0].

This results in that norm.iloc[0] will be 1.0000 and norm.iloc[-1] we show the return of this particular stock if invested in on day 1 (index 0) and sold on the day of the last index (index -1), in the case of the video: 1.839521.

Next step?

Want to learn more?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

Follow the link and read more.

Leave a Reply