How To use Matplotlib Object Oriented with NumPy and Pandas

What will we cover in this tutorial?

If you like data visualization with NumPy and Pandas, then you must have encountered Matplotlib.

And if you also, like to program in an object oriented fashion, then most tutorial will make you feel wondering if no one loves the art of beautiful code?

Let me elaborate. The integration and interaction with Matplotlib is done in a functional way with a lot of side effects. Not nice.

Not sure what I talk about? We will cover that too.

Step 1: How NumPy is demonstrated to make plots with Matplotlib and what is wrong with it

Let’s make a simple example.

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 5, 11)
y = x ** 2
plt.plot(x, y)
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.title("Title")
plt.show()

This will result in the following chart.

That is nice and easy! So what is wrong with it?

Side effects!

What is a side effect in programming?

…that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation.

https://en.wikipedia.org/wiki/Side_effect_(computer_science)

What does that mean?

Well, let’s examine the above example.

We call plt.plt(x, y) and what happens? Actually we don’t know. We do not get anything in return.

Continue to call plt.xlabel(…), plt.ylabel(…), and plt.title(…). Then we call plt.show() to see the result. Hence, we change the state of the plt library we imported. See, we did not create an object. We call the library directly.

This is difficult as a programmer to understand without having deep knowledge of the library used.

So how to do it in more understandable way?

Step 2: How to create a chart with Matplotlib with NumPy in an object oriented way and why it is better

Let’s look at this code and examine it.

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 5, 11)
y = x ** 2
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()

Here we do it differently but get the same result. It is more understandable that when we call a method on object ax, that the state of ax is changing and not something in the library hidden in some side effect.

You can also show the the figure fig by calling show() and not the library. This requires that we add waitforbuttonpress() on plt, otherwise it will destroy the window immediately.

Note, that you do not have these challenges in JuPyter notebook – the plots are shown without the call to show.

You could keep the plt.show() instead of fig.show() and plt.waitforbuttonpress(). But the above code is more intuitive and easier to understand.

How to create a chart with Matplotlib of a Pandas DataFrame in an object oriented way

This is straight forward as Matplotlib is well integrated with Pandas.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x = np.linspace(0, 5, 11)
y = x ** 2
df = pd.DataFrame(data=y, index=x)
fig, ax = plt.subplots()
ax.plot(df)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()

Notice, that the DataFrame is created from the NumPy arrays. Hence, here we do not gain anything from using it. This is just to exemplify how easy it is to use s in an object oriented way with Pandas.

Final thoughts

I have found that programmer either hate or love Matplotlib. I do not always know why, but I have discovered that this non-object oriented way of using Matplotlib is annoying some programmers.

This is a good reason to hate it, but I would say that there are no good alternative to Matplotlib – or at least, they are build upon Matplotlib.

I like the power and ease using Matplotlib. I do like that the option of using it object oriented, which makes the code more intuitive and easier to understand for other programmers.

How to Create Excel Report Automation with Python

What will we cover in this tutorial?

How to create excel report automation with Python. This will be done on the data similar to this.

The data we work with – 1000 lines like the above – get Excel sheet form link below.

And we will create Excel reports from Python like these ones.

The first sheet we will create in Excel from Python.
The second Excel sheet we create from Python.

Get direct Download link to JuPyter Notebook and Excel sheet used in YouTube video and follow along.

Step 1: Load the data in Pandas DataFrame

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas has great integration with Excel – both reading and writing Excel files. We will use that in this tutorial.

To read the data simply use the read_excel(…).

import pandas as pd
data = pd.read_excel("SalesData.xlsx", index_col="Date")
print(data.head())

Which will result in output similar to this.

	Sales rep	Item	Price	Quantity	Sale
Date					
2020-05-31	Mia	Markers	4	1	4
2020-02-01	Mia	Desk chair	199	2	398
2020-09-21	Oliver	Table	1099	2	2198
2020-07-15	Charlotte	Desk pad	9	2	18
2020-05-27	Emma	Book	12	1	12

Step 2: Group the data by Sales rep

To identify how much each Sales rep has sold, we will group it and calculate the sum.

repr = data.groupby("Sales rep")
repr_sales = repr['Sale'].sum()

Now that is too simple, right?

The data in rep_sales looks like this.

Sales rep
Charlotte     74599
Emma          65867
Ethan         40970
Liam          66989
Mia           88199
Noah          78575
Oliver        89355
Sophia       103480
William       80400
Name: Sale, dtype: int64

Step 3: Group the data by month, change to month names and sum it

This is a bit more challenging, as we need to group the dates in months and rename the index to month names.

Except if you know how to do it, as you will in a moment.

months = data.groupby(pd.Grouper(freq="M"))
months_sales = months['Sale'].sum()
months_sales.index = months_sales.index.month_name()

You are awesome. See, you use groupby on a Grouper with frequency month (“M”). To use the month name on index and not the last day of month, you reassign the index to use index.month_name().

This creates the following output where you can see the month names are used.

Date
January      69990
February     51847
March        67500
April        58401
May          40319
June         59397
July         64251
August       51571
September    55666
October      50093
November     57458
December     61941
Name: Sale, dtype: int64

Step 4: Create Excel sheet with the data from Step 2 and 3

This is done by creating an Excel writer.

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")
writer.close()

This will create an Excel sheet of name SalesReport.xlsx, with two sheets Sale per rep and Sale per month, respectively.

But let’s try to add the two charts.

Step 5: Adding the charts

This is a bit more involved. As you see, the above steps have been quite straight forward and not required any high level Python programming skills (or Excel, except you need to know how to open an Excel sheet).

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")
chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per rep\'!$B$2:$B$10',
    'categories': '=\'Sale per rep\'!$A$2:$A$10',
    'name': "Sale"
})
writer.sheets['Sale per rep'].insert_chart("D2", chart)
chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per month\'!$B$2:$B$13',
    'categories': '=\'Sale per month\'!$A$2:$A$13',
    'name': "Sale"
})
writer.sheets['Sale per month'].insert_chart("D2", chart)
writer.close()

This will create the sheets we have been waiting for. The charts are dynamic, meaning, that if you change the values the charts will change accordingly.

First sheet from our results
Second sheet from out results

Final remarks

Remember to see the YouTube video, which explains the material in more depth.

If you like the above content, please share it on a social media or point a friend in this direction.

How To Color Cells Dynamically in Excel Based of Formula done from Python

What will we cover in this tutorial?

We will learn how you can dynamically format the cells in Excel to have a specific color based on the value of a function – all done from Python.

As an example we will color the cells of two columns depending on which is larger.

If the value in MA50 is larger than value in MA200 it will color green and red otherwise.

In this concrete example we will collect historical stock prices and calculate the Moving Averages of 50 and 200 days on the closing price. In a crossover way to utilize moving averages, you would send a buy-signal when the 50-day moving average crosses above the 200-day moving average and a sell-signal if the other way around.

Step 1: Collect the data from Yahoo! Finance using Pandas-datareader

A great library to use to collect financial data is the Pandas-datareader.

We will collect data from Yahoo! Finance. As an example we will use General Motors historical stock data.

This can be done as follows.

import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

This will result in a Pandas DataFrame with data similar to this.

                 High        Low       Open      Close      Volume  Adj Close
Date                                                                         
2019-01-02  33.810001  32.410000  32.849998  33.639999   7662300.0  31.893360
2019-01-03  33.680000  32.200001  33.340000  32.250000  11618300.0  30.575533
2019-01-04  33.570000  32.430000  32.799999  33.330002  10784000.0  31.599455
2019-01-07  34.970001  33.480000  33.709999  34.360001  10722300.0  32.575977
2019-01-08  35.680000  34.490002  35.000000  34.810001  10532900.0  33.002617

Step 2: Calculate the Moving Averages

The moving average can be calculated by applying the rolling() and mean() methods on the DataFrame.

The rolling() creates a window size which is the number of observations used for calculating the statistic.

To calculate the Moving Average of window size 50 will use the last 50 data points to calculate the mean. To be consistent with Yahoo! Finance way of calculate the value, they use the Close price.

import pandas_datareader as pdr
import datetime as dt
import xlsxwriter
import pandas as pd

start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)
ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

Which will give a tail similar to this output.

                 High        Low       Open  ...  Adj Close     MA50     MA200
Date                                         ...                              
2020-12-14  42.540001  41.419998  42.490002  ...  41.619999  38.5632  29.03155
2020-12-15  42.160000  41.310001  41.990002  ...  41.660000  38.7772  29.08725
2020-12-16  41.869999  40.810001  41.790001  ...  41.419998  38.9976  29.13670
2020-12-17  42.029999  41.430000  41.709999  ...  42.029999  39.2058  29.19635
2020-12-18  42.042801  41.139999  42.020000  ...  41.389999  39.3894  29.25985

Where the Moving Averages are added to the DataFrame.

Step 3: Exporting to Excel using XlsxWriter

We will use the powerful XlsxWriter to create our Excel sheet.

You can write the DataFrame directly by using to_excel(…), but we want to do more than that. Hence, we use the XlsxWriter directly.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd

start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)
ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()
# We only want the index as a date and not datetime object
ticker.index = ticker.index.date
# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')
# Remember to close the writer to write the Excel sheet
writer.close()

This will create an Excel sheet similar to this one.

Step 4: Adding formatting to the cells

This is a bit more complex as we need to do the magic here.

We first need to create a reference to the sheet (worksheet) we work on to access the functionality.

The first thing we do is to resize column A, such that it opens in the correct size to see the date. This is done by using set_column(…).

Then we create a format, which can be applied on Excel cells. This is how we color them. Hence, we create one for green cells and later one for red cells.

We insert them by using conditional_format(…), which a range of cells. This can be applied in different ways, we use the A1 notation.

The options parameter in conditional_format(…) must be a dictionary containing the parameters that describe the type and style of the conditional format. To see details of the format please refer to the manual.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd

start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)
ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()
# We only want the index as a date and not datetime object
ticker.index = ticker.index.date
# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Moving Averages']
# Resize the column A
worksheet.set_column("A:A", 12)
# Calculate the last row number (we insert first DataFrame row in row 2)
last_row = len(ticker) + 1
# Create a format for a green cell
cell_format_green = workbook.add_format()
cell_format_green.set_bg_color('green')
# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2>=$I2',
                                                         'format':   cell_format_green})
# Create a format for a red cell
cell_format_red = workbook.add_format()
cell_format_red.set_bg_color('red')
# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2<$I2',
                                                         'format':   cell_format_red})
# Remember to close the writer
writer.close()

This will result in the expected Excel sheet.

Step 5: Validating it works as expected

Now for the final test.

The colors should be updated dynamically. Let’s try to change the value and see what happens.

It does. If we change the value of H2 to 10, it turns the cells to red.

How to Create Awesome Mosaic Picture in Excel with Python

What will we do in this tutorial?

Create a mosaic in Excel using Python.

See the tutorial on YouTube and hear how you can use it as prank!

Step 1: How to create a mosaic from a photo

If you want a deeper description of how to create a mosaic you should read the following tutorial, which shows the code on how to do it.

Step 2: Install the necessary libraries

You need to install OpenCV. If you use PyCharm you can follow this tutorial.

Otherwise you can install the libraries as follows.

pip install opencv-python
pip install numpy
pip install xlsxwriter

Please ask if you have troubles with it.

Step 3: The code used to create the Mosaic in Excel

This will just provide the code for you to enjoy. The process code is used from the tutorial linked above, where it is described.

import cv2
import numpy as np
import xlsxwriter

def create_mosaic_in_excel(photo, box_height, box_width, col_width=2, row_height=15):
    # Get the height and width of the photo
    height, width, _ = photo.shape
    # Create Excel workbook and worksheet
    workbook = xlsxwriter.Workbook('mosaic.xlsx')
    worksheet = workbook.add_worksheet("Urgent")
    # Resize columns and rows
    worksheet.set_column(0, width//box_width - 1, col_width)
    for i in range(height//box_height):
        worksheet.set_row(i, row_height)
    # Create mosaic
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            # Create region of interest (ROI)
            roi = photo[i:i + box_height, j:j + box_width]
            # Use numpy to calculate mean in ROI of color channels
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            # Convert mean to int
            b_mean_int = b_mean.astype(int).item()
            g_mean_int = g_mean.astype(int).item()
            r_mean_int = r_mean.astype(int).item()
            # Create color code
            color = '#{:02x}{:02x}{:02x}'.format(r_mean_int, g_mean_int, b_mean_int)
            # Add color code to cell
            cell_format = workbook.add_format()
            cell_format.set_bg_color(color)
            worksheet.write(i//box_height, j//box_width, "", cell_format)
    # Close and write the Excel sheet
    workbook.close()

def main():
    photo = cv2.imread("rune.png")
    number_cols = 50
    number_rows = 45
    # Get height and width of photo
    height, width, _ = photo.shape
    box_width = width // number_cols
    box_height = height // number_rows
    # To make sure that it we can slice the photo in box-sizes
    width = (width // box_width) * box_width
    height = (height // box_height) * box_height
    photo = cv2.resize(photo, (width, height))
    # Create the Excel mosaic
    create_mosaic_in_excel(photo.copy(), box_height, box_width, col_width=2, row_height=15)

main()

Step 4: What to modify

The above tutorial assumes a photo of me in rune.png. I used the one taken from this page. You should obviously change it to something else.

You can change how many columns and rows in the Excel sheet the mosaic should be. This is done by changing the values of number_cols and number_rows.

Then you can change the values of col_width=2 and row_height=15.

In the YouTube video I use this free picture from Pexels (download) and modify number_cols = 100 and number_rows = 90, and col_width=1 and row_height=6.