CSV GroupBy Processing to Excel with Charts using Pandas (Python)

What will we cover?

We will demonstrate how to read CSV data from a GitHub. How to group the data by unique values in a column and sum it. Then how to group and sum data on a monthly basis. Finally, how to export this into a multiple sheet Excel document with chart.

Step 1: Get and inspect the data

We can use pandas to read the CSV data (see more about CSV files here).

import pandas as pd

url = 'https://raw.githubusercontent.com/LearnPythonWithRune/LearnPython/main/files/SalesData.csv'
data = pd.read_csv(url, delimiter=';', parse_dates=True, index_col='Date')

print(data.head())

This will read our data directly from GitHub and show the first few lines.

            Sales rep        Item  Price  Quantity  Sale
Date                                                    
2020-05-31        Mia     Markers      4         1     4
2020-02-01        Mia  Desk chair    199         2   398
2020-09-21     Oliver       Table   1099         2  2198
2020-07-15  Charlotte    Desk pad      9         2    18
2020-05-27       Emma        Book     12         1    12

This data shows different sales represents and a list over their sales in 2020.

Step 2: Use GroupBy to get sales of each represent and monthly sales

It is easy to group data by columns. The below code will first group all the Sales reps and sum their sales. Second, it will group the data in months and sum it.

repr_sales = data.groupby("Sales rep").sum()['Sale']
print(repr_sales)

monthly_sale = data.groupby(pd.Grouper(freq='M')).sum()['Sale']
monthly_sale.index = monthly_sale.index.month_name()
print(monthly_sale)

This gives.

Sales rep
Charlotte     74599
Emma          65867
Ethan         40970
Liam          66989
Mia           88199
Noah          78575
Oliver        89355
Sophia       103480
William       80400
Name: Sale, dtype: int64
Date
January      69990
February     51847
March        67500
April        58401
May          40319
June         59397
July         64251
August       51571
September    55666
October      50093
November     57458
December     61941
Name: Sale, dtype: int64

Step 3: Create a multiple sheet Excel document with charts

Now for the export magic.

workbook = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(workbook, sheet_name='Sales per rep')
monthly_sale.to_excel(workbook, sheet_name='Monthly')

chart1 = workbook.book.add_chart({'type': 'column'})

# Configure the first series.
chart1.add_series({
    'name':       'Sales per rep',
    'categories': '=\'Sales per rep\'!$A$2:$A$10',
    'values':     '=\'Sales per rep\'!$B$2:$B$10',
})

workbook.sheets['Sales per rep'].insert_chart('D2', chart1)

chart1 = workbook.book.add_chart({'type': 'column'})

# Configure the first series.
chart1.add_series({
    'name':       'Monthly sales',
    'categories': '=Monthly!$A$2:$A$13',
    'values':     '=Monthly!$B$2:$B$13',
})

workbook.sheets['Monthly'].insert_chart('D2', chart1)

workbook.close()

This will create an Excel document called SalesReport.xlsx in your working directory.

To get a detailed explanation see the video in the top of the post.

Want to learn more?

Want to learn more Python, then this is part of a 8 hours FREE video course with full explanations, projects on each levels, and guided solutions.

The course is structured with the following resources to improve your learning experience.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • A FREE 70+ pages eBook with all the learnings from the lessons.

See the full FREE course page here.

If you instead want to learn more about Machine Learning. Do not worry.

Then check out my Machine Learning with Python course.

  • 15 video lessons teaching you all aspects of Machine Learning
  • 30 JuPyter Notebooks with lesson code and projects
  • 10 hours FREE video content to support your learning journey.

Go to the course page for details.

3 Replies to “CSV GroupBy Processing to Excel with Charts using Pandas (Python)”

  1. in windows 10 i get error when trying the example above:
    AttributeError: ‘OpenpyxlWriter’ object has no attribute ‘add_chart’

    using pytho3.10 64bit

    Nevertheless, I like your style of presenting python, and the hard work put into all the tutorials. A big thank you! I have a long way to go, but I am enjoying it so far!

  2. further to my earlier message where i noted this doesn’t work in win doze environ. I am glad to advise that it does work fine under macos big sur (arm64), also running python3.10 (but natively under arm64)

    1. Hi,
      There can be two things.
      Most likely it just needs to install the XlsxWriter.
      This is done as follows:

      pip install xlsxwriter

      This should be run in a terminal in the environment you use.

      If this does not solve the problem, try to update the pandas package

      pip install –upgrade pandas

      This will update it to the newest version.

      Hope it helps
      Rune

Leave a Reply