Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Get Started with pandas for Data Science

    Unlock the Power of pandas for Tabular Data Manipulation in Data Science

    Why It’s Great to Master pandas for Tabular Data:

    • Comprehensive Toolset: pandas is a versatile and powerful library that offers a wide range of functionalities for working with tabular data. By mastering pandas, you gain access to a comprehensive toolkit that covers every step of the data science workflow, from data acquisition to analysis and visualization.
    • Efficient Data Acquisition: pandas provides intuitive functions to acquire data from various sources, including spreadsheets, databases, and other file formats. You can effortlessly import data into pandas DataFrames, making it easy to access and work with diverse datasets.
    • Seamless Data Exploration: With pandas, you can explore your data efficiently. It offers a rich set of functions for data filtering, sorting, aggregation, and descriptive statistics. By leveraging pandas’ exploratory data analysis capabilities, you can gain valuable insights into your data’s structure, patterns, and relationships.
    • Reliable Data Cleaning: Data cleaning is a crucial aspect of the data science workflow, and pandas excels in this area. It offers a wide range of tools for handling missing data, removing duplicates, and performing data transformations. By mastering pandas’ data cleaning capabilities, you can ensure the quality and integrity of your datasets.
    • Streamlined Data Processing: pandas simplifies data processing tasks through its intuitive and expressive syntax. You can perform various operations like merging, joining, grouping, and reshaping data with ease. pandas’ powerful data manipulation capabilities enable you to preprocess and transform your data efficiently.
    • Robust Data Analysis: With pandas, you can perform a broad range of data analysis tasks. It provides extensive statistical and analytical functions that enable you to derive meaningful insights from your data. By mastering pandas’ analysis tools, you can uncover patterns, trends, and correlations in your datasets.
    • Enhanced Data Visualization: pandas seamlessly integrates with popular visualization libraries such as Matplotlib and Seaborn. You can create stunning visualizations, including plots, charts, and graphs, to effectively communicate your data findings. By mastering pandas’ visualization capabilities, you can present your insights in a visually appealing and impactful manner.

    Topics Covered in This Tutorial and later in course

    • Data Acquisition: Learn how to acquire data from different sources using pandas.
    • Data Exploration: Discover pandas’ powerful functions for exploring and understanding your data, such as data filtering, sorting, and descriptive statistics.
    • Data Cleaning: Master pandas’ techniques for handling missing data, removing duplicates, and performing data transformations to ensure data quality.
    • Data Processing: Explore pandas’ functionalities for merging, joining, grouping, and reshaping data to prepare it for analysis.
    • Data Analysis: Utilize pandas’ statistical and analytical tools to perform comprehensive data analysis and derive valuable insights.
    • Data Visualization: Learn how to visualize your data using pandas in conjunction with visualization libraries like Matplotlib and Seaborn.

    It covers most of the Data Science Workflow.

    Data Science Workflow
    Watch tutorial

    Great community

    pandas has a big community with a lot of help, which is essential when you choose your main library to handle data.

    It is no secret that pandas is a large tool and at times can seem complex. Actually, pandas can do (almost) everything with data – you could say, if you can do it in Excel, you can certainly do it pandas and even more automatically.

    Getting started with pandas

    If you use JuPyter Notebooks pandas is installed by default. If you use another framework, you can install pandas as follows in a terminal.

    pip install pandas
    

    Now let’s get started with pandas.

    import pandas as pd
    data = pd.read_csv('https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/aapl.csv', parse_dates=True, index_col=0)
    data.head()
    

    This should output the following in your Notebook.

    If you use another framework and nothing shows up. Then you should change the last line to.

    print(data.head())
    

    Index and Columns

    You can get the index of the DataFrame by using .index. This will give the index column.

    data.index
    

    Also, you can get all the column names as follows.

    data.columns
    

    Column data type of DataFrame

    You can get the data types of the columns of a DataFrame as follows.

    data.dtypes
    

    The Size and Shape of Data

    You get the number of rows of data by using len(data)

    len(data)
    

    Here it will print 472.

    Also you can get the shape of data, which is the number of rows and columns.

    data.shape
    

    Which would give (472, 6).

    Slicing rows and columns

    A DataFrame can be used to select (or filter) data in many ways. This is often called slicing. Below we give a few examples, which cover most common cases.

    • data['Close']: Select one column (Series)
    • data[['Open', 'Close']]: Select multiple columns with specific names
    • data.loc['2020-05-01':'2021-05-01']: Select all columns between the dates (including 2021-05-01)
    • data.iloc[50:55]: Select all columns between rows 50-55 (excluding 55)

    First let’s try to select one column in the DataFrame. This will return a Series, which is another data structure in pandas. A Series is just a list of data using the same index as a the DataFrame.

    data['Close']
    

    You can also select two columns or more columns, using a list (the square brackets []) inside it. Here is an example.

    data[['Close', 'Open']]
    

    If you want data from a range of rows, you can use the index. Here you need to specify the index. Also notice that the from and to index are both included.

    data.loc['2021-05-03':'2021-05-14']
    

    In this case we use a DatetimeIndex, hence, we can list all data for a given day, month, year or similar as follows.

    data.loc['2021-05']
    

    Sometimes we do not want to use the index type, then we can use an integer index as follows.

    data.iloc[50:55]
    

    Arithmetic operations

    Like with an Excel sheet, you want to make calculations on columns of data. This can be done simple as the following example shows. Notice, that you can create a new column easily in your DataFrame.

    • Calculating with columns on all rows
      • Example: data['Close'] - data['Open']
    • Creating new columns
      • Example: data['New'] = data['Open'] - data['Close']
    data['New'] = data['Open'] - data['Close']
    

    Select data

    When you want to filter data based on value, you can do it as the following example shows.

    • Select data based boolean expressions
      • Example: data['New'] > 0
      • Example: data[data['New'] > 0]
    data[data['New'] > 0]
    

    Groupby and value_counts

    A great thing about DataFrames is how easy it is to group data. Here we make an example, which might not make sense, but it is just to illustrate it.

    • Exampledata['Category'] = data['New'] > 0 data.groupby('Category').mean()
    • Exampledata['Category'].value_counts() (data['New'] > 0).value_counts()
    data['Category'] = data['New'] > 0
    data.groupby('Category').mean()
    
    data['Category'].value_counts()
    
    (data['New'] > 0).value_counts()
    

    Want to learn more?

    Want to learn more about Data Science to become a successful Data Scientist?

    In the next lesson you will learn Web Scraping and Data Wrangling with pandas in this Data Science course.

    This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.

    • 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (YouTube video).
    • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
    • 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment