What will we cover?
In this lesson we will learn how to add new columns calculated from values in other columns in our DataFrame. This is similar to calculate in Excel on data from different columns.
Then we will demonstrate some useful function when working with financial data.
Finally, we will show how to remove (or drop) columns from our DataFrame.
Step 1: Load the data.
As usual we need to load the data into our DataFrame. You can get the CSV file from here.
import pandas as pd data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
It is always a good habit to inspect the data with data.head() (see the video lesson or the in the Notebook link below the video for expected output).
Step 2: Create new columns in a DataFrame (Pandas)
To create a new column in our data set simply write as follows.
data['Daily chg'] = data['Close'] - data['Open']
The above statement will create a new column named Daily chg with the difference between column Close and Open.
Similarly, you can create a column with the normalized data as follows.
data['Normalized'] = data['Close'] / data['Close'].iloc
This is how easy it is to work with.
Step 3: Get min and max in DataFrame columns
To find the minimum of a column.
This will find the minimal value of the column Close.
To find the index of the minimum value use the following.
You can do similar things as the following shows.
data['Normalized'].min() data['Normalized'].argmin() data['Close'].max() data['Close'].argmax()
Step 4: Get the mean value of a column in a DataFrame (Pandas)
To get the mean value of a column, simply use mean().
Step 5: Remove / Delete columns in a DataFrame
It is always good practice to remove the columns of data we do not intend to use anymore. This can be done by using drop().
data.drop(labels=['High', 'Low', 'Adj Close', 'Volume'], axis=1, inplace=True)
Where we use the following arguments.
- labels=[‘High’, ‘Low’, ‘Adj Close’, ‘Volume’] sets the labels of the columns we want to remove.
- axis=1 sets the axis of the labels. Default is 0, and will look for the labels on the index. While axis 1 is the column names.
- inplace=True says it should actually remove the columns on the DataFrame we work on. Otherwise it will return a new DataFrame without the columns.
What is next?
Want to learn more?
This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.
Follow the link and read more.