In this tutorial we will get familiar to work with DataFrames – the primary data structure in Pandas.
We will learn how to read a historical stock price data from Yahoo! Finance and load it into a DataFrame. This will be done by exporting a CSV file from Yahoo! Finance and load the data. Later we will learn how to read the data directly from the Yahoo! Finance API.
A DataFrame is similar to an Excel sheet. DataFrames can contain data in a similar way as we will see in this lesson.
Then we will learn how to use the index of the dates. This will be necessary later when we make calculations later on.
The first part of the tutorial will give the foundation of what you need to know about DataFrames for financial analysis.
In this first lesson we will download historical stock prices from Yahoo! Finance as CSV file and import them into our Jupyter notebook environment in a DataFrame.
If you are new to CSV files and DataFrames. Don’t worry, that is what we will cover here.
Let’s start by going to Yahoo! Finance and download the CVS file. In this course we have used Apple, but feel free to make similar calculation on a stock of your choice.
Go to Yahoo! Finance write AAPL (ticker for Apple) and press Historical Data and download the CSV data file.
The CSV data file will contain Comma Separated Values (CSV) similar to this.
Date,Open,High,Low,Close,Adj Close,Volume
2020-03-02,70.570000,75.360001,69.430000,74.702499,74.127892,341397200
2020-03-03,75.917503,76.000000,71.449997,72.330002,71.773636,319475600
2020-03-04,74.110001,75.849998,73.282501,75.684998,75.102829,219178400
2020-03-05,73.879997,74.887497,72.852501,73.230003,72.666725,187572800
2020-03-06,70.500000,72.705002,70.307503,72.257500,71.701706,226176800
The first line shows the column names (Date, Open, High, Low, Close, Adj Close, Volume). Then each line contains a data entry for a given day.
n Jupyter Notebook start by importing the Pandas library. This is needed in order to load the data into a DataFrame.
import pandas as pd
data = pd.read_csv("AAPL.csv", index_col=0, parse_dates=True)
data.head()
The read_csv(…) does all the magic for us. It will read the CSV file AAPL.csv. The AAPL.csv file is the one you downloaded from Yahoo! Finance (or from the zip-file downloaded above) and needs to be located in the same folder you are working from in your Jupyter notebook.
The arguments in read_csv(…) are the following.
In the video lesson we explore the type of columns and index.
data.dtypes
data.index
Which will reveal the data type and index of the DataFrame. Notice, that each column has its own data type.
We can use loc to lookup an index with a date.
data.loc['2020-01-27']
This will show the data for that specific date. If you get an error it might be because your dataset does not contain the above date. Choose another one to see something similar to this.
Open 7.751500e+01
High 7.794250e+01
Low 7.622000e+01
Close 7.723750e+01
Adj Close 7.657619e+01
Volume 1.619400e+08
Name: 2020-01-27 00:00:00, dtype: float64
A more advanced option is to use an interval (or slice as it is called). Slicing with loc on a DataFrame is done by using a starting and ending index .loc[start:end] or an open ended index .loc[start:], which will take data beginning from start to the last data.
data.loc['2021-01-01':]
This will give all the data starting from 2020-01-01. Notice, that there is no data on January 1st, but since the index is interpreted as a datetime, it can figure out the first date after.
Open High Low Close Adj Close Volume
Date
2021-01-04 133.520004 133.610001 126.760002 129.410004 129.410004 143301900
2021-01-05 128.889999 131.740005 128.429993 131.009995 131.009995 97664900
2021-01-06 127.720001 131.050003 126.379997 126.599998 126.599998 155088000
2021-01-07 128.360001 131.630005 127.860001 130.919998 130.919998 109578200
2021-01-08 132.429993 132.630005 130.229996 132.050003 132.050003 105158200
2021-01-11 129.190002 130.169998 128.500000 128.979996 128.979996 100620900
Similarly, you can create slicing with an open-ended start.
data.loc[:'2020-07-01']
Another important way to index into DataFrames is by iloc[], which does it with index.
data.iloc[0]
data.iloc[-1]
Where you can index from the start with index 0, 1, 2, 3, … Or from the end -1, -2, -3, -4, …
This is part of the course of Master Technical Analysis with pandas.
In the next lesson you will learn about DataFrame Columns and Series for Financial Analysis.
Would you like to get 12% in return of your investments?
D. A. Carter promises and shows how his simple investment strategy will deliver that in the book The 12% Solution. The book shows how to test this statement by using backtesting.
Did Carter find a strategy that will consistently beat the market?
Actually, it is not that hard to use Python to validate his calculations. But we can do better than that. If you want to work smarter than traditional investors then continue to read here.
Build and Deploy an AI App with Python Flask, OpenAI API, and Google Cloud: In…
Python REST APIs with gcloud Serverless In the fast-paced world of application development, building robust…
App Development with Python using Docker Are you an aspiring app developer looking to level…
Why Value-driven Data Science is the Key to Your Success In the world of data…
Harnessing the Power of Project-Based Learning and Python for Machine Learning Mastery In today's data-driven…
Is Python the right choice for Machine Learning? Should you learn Python for Machine Learning?…