What will we cover?
In the first lesson we learnt how to load data into a DataFrame. This part will show how to work with each column in the DataFrame. The columns are represented by a different data type, called Series.
n this lesson we will learn how to make calculations on the columns. The columns are represented by a data type called Series.
Each column in a DataFrame is a Series and can be easily accessed. Also, it is easy to calculate new Series of data. This is similar to calculate now columns of data in an Excel sheet.
We will explore that and more in this lesson.
Step 1: Load the data
We will start by importing the data (CSV file available here).
import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
Step 2: Explore the data and data type
In the video we explore the data to ensure it is correct. You can do that by using data.head().
Then we investigate the data type of the columns of the DataFrame data.
data.dtypes
Which results in the following.
Open float64
High float64
Low float64
Close float64
Adj Close float64
Volume int64
dtype: object
This means shows that each column has one data type. Here Open is float64. This is one difference from Excel sheets, where each cell has a data type. The advantage of restricting a data type per column is speed.
The data type of data is DataFrame.
type(data)
The build in function type(…) gives you the type. It is handy to use it when exploring data.
pandas.core.frame.DataFrame
Notice that it is given by a long string pandas.core.frame.DataFrame, this is the structure of the library Pandas.
The data type of a column in a DataFrame can be found by.
type(data['Close'])
Where data[‘Close’] gives access to column Close in the DataFramedata.
pandas.core.series.Series
Where we see a column is represented as a Series. The is similar to a DataFrame that it has an index. E.g. the Series data[‘Close’] has the same index as the DataFrame data. This is handy when you need to work with the data as you will see in a moment.
Step 3: Calculating with Series
To keep it simple, we will start by the daily difference from open and close.
daily_chg = data['Open'] - data['Close']
This calculates a Series daily_chg with the opening price minus the closing price.
Please explore the full data in daily_chg with the data in data.
A more advanced calculation is this one.
daily_pct_chg = (data['Close'] - data['Open'])/data['Open']*100
Where we calculate the daily percentage change. In the calculation above we have limited us to only use data on the same rows (same dates). Later we will learn how to do it with data from previous day (the row above).
Step 4: Normalize stock data
Now we will normalize the data by using the iloc we learned about in previous lesson.
norm = data['Close']/data['Close'].iloc[0]
The above statements calculates a Series norm where the Close price is normalized by dividing by the first available Close price, accessed by using iloc[0].
This results in that norm.iloc[0] will be 1.0000 and norm.iloc[-1] we show the return of this particular stock if invested in on day 1 (index 0) and sold on the day of the last index (index -1), in the case of the video: 1.839521.
Next step?
Want to learn more?
This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.
Follow the link and read more.