Matplotlib Visualization for DataFrame Time Series Data

What will we cover in this tutorial?

We will learn how to visualization time series data in a DataFrame with Matplotlib.

This tutorial will show you.

  • How to use Matplotlib with DataFrames.
  • Use Matplotlib with subplots (the object-oriented way).
  • How to make multiple plots in one figure.
  • How to create bar-plots

Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

import pandas as pd

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

remote_file = "https://raw.githubusercontent.com/LearnPythonWithRune/FinancialDataAnalysisWithPython/main/AAPL.csv"
data = pd.read_csv(remote_file, index_col=0, parse_dates=True)

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

import matplotlib.pyplot as plt
%matplotlib notebook

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

fig, ax = plt.subplots()
data['Close'].plot(ax=ax)
ax.set_ylabel("Price")
ax.set_title("AAPL")

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Luckily, this is quite easy.

fig, ax = plt.subplots(2, 2)
data['Open'].plot(ax=ax[0, 0], title="Open")
data['High'].plot(ax=ax[0, 1], title="High")
data['Low'].plot(ax=ax[1, 0], title="Low")
data['Close'].plot(ax=ax[1, 1], title="Close")
plt.tight_layout()

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

Actually, we will render a horizontal bar-chart.

fig, ax = plt.subplots()
data['Volume'].loc['2020-07-01':'2020-08-15'].plot.barh(ax=ax)

We do it for the volume and only on a limited interval of time. This shows you how to take advantage of the time series aspect of the DataFrame.

Next step

The above is part of the FREE 2h Video course.

Backtesting a Trading Strategy with Pandas and Python

What will we cover?

In this tutorial we will cover.

  • What is a backtesting strategy?
  • How to measure the performance of a backtesting strategy?
  • How to implement a backtesting strategy with Pandas?

What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np

start = dt.datetime(2010, 1, 1)

data = pdr.get_data_yahoo("AAPL", start)

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

                High       Low      Open     Close       Volume  Adj Close
Date                                                                      
2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136

Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

                  High         Low        Open  ...   Adj Close    Signal  Position
Date                                            ...                                
2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)
data.dropna(inplace=True)

Where drop removes columns not needed and dropna removes rows with NaN.

The inplace=True is simply to apply it on the DataFrame.

            Adj Close    Signal  Position
Date                                     
2010-02-01   5.990476 -0.217986       0.0
2010-02-02   6.025239 -0.252087       0.0
2010-02-03   6.128909 -0.282004       0.0
2010-02-04   5.908031 -0.297447       0.0
2010-02-05   6.012933 -0.253271       0.0

Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']

This result in the following.

             Adj Close    Signal  Position  Log return  Return
Date                                                          
2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0

Now the additive advance of log returns comes in handy.

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

data[['Log return', 'Return']].cumsum().apply(np.exp)

Resulting in the following.

            Log return   Return
Date                           
2021-02-26   20.242133  7.29214
2021-03-01   21.332196  7.29214
2021-03-02   20.886489  7.29214
2021-03-03   20.375677  7.29214
2021-03-04   20.053499  7.29214

Using a bit calculations.

np.exp(data[['Log return', 'Return']].mean()*252)

We get.

Log return    1.310917
Return        1.196485
dtype: float64

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

data[['Log return', 'Return']].std()*252**.5

Which gives the annualized standard deviation.

Log return    0.283467
Return        0.188044
dtype: float64

Hence, the gain from our strategy is a less volatile strategy.

Calculate the Average True Range (ATR) Easy with Pandas DataFrames

What will we cover in this tutorial?

In this tutorial we will cover the following

Step 1: Read historic stock prices from Yahoo! Finance API

To read data from Yahoo! Finance API we use Pandas-Datareader, which has a direct method.

This requires that we give a start date on how old data we want to retrieve.

import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

print(data.tail())

This we result in similar output.

	High	Low	Open	Close	Volume	Adj Close
Date						
2021-02-12	561.250000	550.849976	556.940002	556.520020	2195900	556.520020
2021-02-16	563.630005	552.729980	557.289978	557.280029	2622400	557.280029
2021-02-17	555.250000	543.030029	550.989990	551.340027	2069600	551.340027
2021-02-18	550.000000	538.229980	549.000000	548.219971	2456200	548.219971
2021-02-19	548.989990	538.809998	548.000000	540.219971	2838600	540.219971

Calculate the Average True Range (ATR)

The Average True Range (ATR) is calculated as follows, as investopedia.org defines it.

This can be calculated as follows.

import numpy as np
import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

high_low = data['High'] - data['Low']
high_close = np.abs(data['High'] - data['Close'].shift())
low_close = np.abs(data['Low'] - data['Close'].shift())

ranges = pd.concat([high_low, high_close, low_close], axis=1)
true_range = np.max(ranges, axis=1)

atr = true_range.rolling(14).sum()/14

Where we use the 14 days standard.

Visualize the ATR and the stock price

We will use Matplotlib to visualize it as it integrates well with DataFrames from Pandas.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
atr.plot(ax=ax)
data['Close'].plot(ax=ax, secondary_y=True, alpha=0.3)
plt.show()

This will result in a chart similar to this one.

Excel Automation with Simple Moving Average from Python

What will we cover in this tutorial?

We will retrieve the historic stock prices and calculate the moving average. Then we will export the data to Excel and insert a chart, but all done from Python.

See the in depth explanation in the YouTube video. It also gives advice on how to interpret the Simple Moving Averages (SMA).

Step 1: Read historic stock prices

We will use the Pandas-datarader to get the historic prices of NFLX (the ticker for Netflix).

import pandas_datareader as pdr
import datetime as dt

ticker = "NFLX"
start = dt.datetime(2019, 1, 1)

data = pdr.get_data_yahoo(ticker, start)
print(data.head())

And you will get the historic data for Netflix from January 1st, 2019.

	High	Low	Open	Close	Volume	Adj Close
Date						
2019-01-02	269.750000	256.579987	259.279999	267.660004	11679500	267.660004
2019-01-03	275.790009	264.429993	270.200012	271.200012	14969600	271.200012
2019-01-04	297.799988	278.540009	281.880005	297.570007	19330100	297.570007
2019-01-07	316.799988	301.649994	302.100006	315.339996	18620100	315.339996
2019-01-08	320.589996	308.010010	319.980011	320.269989	15359200	320.269989

Step 2: Understand Moving Average

We will calculate the Simple Moving Average as defined on Investopedia.

Simple Moving Average

The Simple Moving Average (Now just referred to as Moving Average or MA) is defined by a period of days.

That is, the MA of a period of 10 (MA10) will take the average value of the last 10 close prices. This is done in a rolling way, hence, we will get a MA10 for every trading day in our historic data, except the first 9 days in our dataset.

We can similarly calculate a MA50 and MA200, which is a Moving Average of the last 50 and 200 days, respectively.

Step 3: Calculating the Moving Averages

We can do that by using rolling and mean.

And it is magic.

data['MA10'] = data['Close'].rolling(10).mean()
data['MA50'] = data['Close'].rolling(50).mean()
data['MA200'] = data['Close'].rolling(200).mean()

print(data.tail())

That was easy, right?

	High	Low	Open	Close	Volume	Adj Close	MA10	MA50	MA200
Date									
2021-01-12	501.089996	485.670013	500.000000	494.250000	5990400	494.250000	515.297998	502.918599	477.08175
2021-01-13	512.349976	493.010010	495.500000	507.790009	5032100	507.790009	512.989999	503.559600	477.76590
2021-01-14	514.500000	499.579987	507.350006	500.859985	4177400	500.859985	510.616995	503.894399	478.39270
2021-01-15	506.320007	495.100006	500.000000	497.980011	5890200	497.980011	506.341998	504.109600	479.06220
2021-01-19	509.250000	493.540009	501.000000	501.769989	11996900	501.769989	504.232999	504.205999	479.72065

Step 4: Visualize it with Matplotlib

We can see the data with Matplotlib.

import matplotlib.pyplot as plt

data[['Close', 'MA10', 'MA50']].loc['2020-01-01':].plot()
plt.show()

Resulting in the following plot.

The output

Where you can see how the MA10 and MA50 move according to the price.

Step 5: Export to Excel

Now we will export the data to Excel.

For this we need to import Pandas and use the XlsxWriter engine, where you can find the details of the code.

The code can be found here.

import pandas as pd

data = data.loc['2020-01-01':]
data = data.iloc[::-1]
writer = pd.ExcelWriter("technical.xlsx", 
                        engine='xlsxwriter', 
                        date_format = 'yyyy-mm-dd', 
                        datetime_format='yyyy-mm-dd')

sheet_name = 'Moving Average'
data[['Close', 'MA10', 'MA50']].to_excel(writer, sheet_name=sheet_name)


worksheet = writer.sheets[sheet_name]
workbook = writer.book

# Create a format for a green cell
green_cell = workbook.add_format({
    'bg_color': '#C6EFCE',
    'font_color': '#006100'
})

# Create a format for a red cell
red_cell = workbook.add_format({
    'bg_color': '#FFC7CE',                            
    'font_color': '#9C0006'
})


# Set column width of Date
worksheet.set_column(0, 0, 15)


for col in range(1, 4):
    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2>=D2',
        'format': green_cell
    })

    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2<D2',
        'format': red_cell
    })

# Create a new chart object.
chart1 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart1.add_series({
        'name': "MA10",
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 2, len(data), 2],
})

# Create a new chart object.
chart2 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart2.add_series({
        'name': 'MA50',
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 3, len(data), 3],
})

# Combine and insert title, axis names
chart1.combine(chart2)
chart1.set_title({'name': sheet_name + " " + ticker})
chart1.set_x_axis({'name': 'Date'})
chart1.set_y_axis({'name': 'Price'})

# Insert the chart into the worksheet.
worksheet.insert_chart('F2', chart1)

writer.close()

Where the output will be something similar to this.

Generated Excel sheet

How to Plot Time Series with Matplotlib

What will we cover in this tutorial?

In this tutorial we will show how to visualize time series with Matplotlib. We will do that using Jupyter notebook and you can download the resources (the notebook and data used) from here.

Step 1: What is a time series?

I am happy you asked.

The easiest way to understand it, is to show it. If you downloaded the resources and started the Jupyter notebook execute the following lines.

import pandas as pd

data = pd.read_csv("stock_data.csv", index_col=0, parse_dates=True)

data.head()

This will produce the following output.

	High	Low	Open	Close	Volume	Adj Close
Date						
2020-01-02	86.139999	84.342003	84.900002	86.052002	47660500.0	86.052002
2020-01-03	90.800003	87.384003	88.099998	88.601997	88892500.0	88.601997
2020-01-06	90.311996	88.000000	88.094002	90.307999	50665000.0	90.307999
2020-01-07	94.325996	90.671997	92.279999	93.811996	89410500.0	93.811996
2020-01-08	99.697998	93.646004	94.739998	98.428001	155721500.0	98.428001

You notice the the far left column is called Date and that is the index. This index has a time value, in this case, a date.

Time series data is data “stamped” by a time. In this case, it is time indexed by dates.

The data you see is historic stock prices.

Step 2: How to visualize data with Matplotlib

The above data is kept in a DataFrame (Pandas data object), this makes it straight forward to visualize it.

import matplotlib.pyplot as plt
%matplotlib notebook

data.plot()

Which will result in a chart similar to this one.

Result

This is not impressive. It seems like something is wrong.

Actually, there is not. It just does what you ask for. It plots all the 6 columns all together in one chart. Because the Volume is such a high number, all the other columns are in the same brown line (the one that looks straight).

Step 3: Matplotlib has a functional and object oriented interface

This is often a bit confusing at first.

But Matplotlib has a functional and object oriented interface. We used the functional.

If you try to execute the following in your Jupyter notebook.

data['My col'] = data['Volume']*0.5
data['My col'].plot()

It would seem like nothing happened.

But then investigate your previous plot.

Previous plot

It got updated with a new line. Hence, instead of creating a new chart (or figure) it just added it to the existing one.

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.

Step 4: How to make a new figure

What to do?

Well, you need to use the object oriented interface of Matplotlib.

You can do that as follows.

fig1, ax1 = plt.subplots()
data['My col'].plot(ax=ax1)

Which will produce what you are looking for. A new figure.

The new figure

Step 5: Make multiple plots in one figure

This is getting fun.

How can you create multiple plots in one figure?

On creating you actually do that.

fig2, ax2 = plt.subplots(2, 2)

data['Open'].plot(ax=ax2[0, 0])
data['High'].plot(ax=ax2[0, 1])
data['Low'].plot(ax=ax2[1, 0])
data['Close'].plot(ax=ax2[1, 1])
plt.tight_layout()

Notice that subplots(2, 2) creates a 2 times 2 array of axis you can use to create a plot.

This should result in this chart.

Result

Step 6: Make a histogram

This can be done as follows.

fig3, ax3 = plt.subplots()

data.loc[:'2020-01-31', 'Volume'].plot.bar(ax=ax3)

Notice that we only take the first month of the Volume data here (data.loc[:’2020-01-31′, ‘Volume’]).

This should result in this figure.

Step 7: Save the figures

This is straight forward.

fig1.savefig("figure-1.png")
fig2.savefig("figure-2.png")
fig3.savefig("figure-3.png")

And the above figures should be available in the same location you are running your Jupyter notebook.

Next step

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.

Create Financial Analysis and With Multiple Sheets and Charts in Excel from Python

What will we cover in this tutorial?

We will get financial data from FPM cloud (Free stock API) for the last years and generate a 3 sheet Excel sheet with charts similar to this. All done from Python.

Step 1: Collect the data

In this tutorial we are only going to use the example data on Apple, that is free available without registering on FMP Cloud. If you want to do it on another stock yo will need to register on their site.

What we need is the income statement and cash flow statement. They are available as JSON on their page (income statement and cash flow statement).

As our main library we will use Pandas.

The data can be collected with the following code (NOTICE: The code syntax highlighter changes & to &amp; in the URLs => remove the amp;).

import pandas as pd

income_statement = pd.read_json("https://fmpcloud.io/api/v3/income-statement/AAPL?limit=120&apikey=demo")
income_statement.set_index('date', inplace=True)
cash_flow = pd.read_json("https://fmpcloud.io/api/v3/cash-flow-statement/AAPL?limit=120&apikey=demo")
cash_flow.set_index('date', inplace=True)

Notice that we set the index to be the date column. This makes the further work easier.

Step 2: Prepare the data

The next step we need to do is to make the necessary calculations and prepare the data.

We are only interested Revenue, Earnings Per Share (EPS) and Free Cash Flow (FCF). So let’s take that data and keep it in a DataFrame (The Pandas main data structure).

data = income_statement[['revenue', 'eps']].copy()
data['fcf'] = cash_flow['freeCashFlow']

Now the data should look something like this.

	        revenue	        eps	fcf
date			
2020-09-26	274515000000	3.3600	7.336500e+10
2019-09-28	260174000000	2.9925	5.889600e+10
2018-09-29	265595000000	3.0025	6.412100e+10
2017-09-30	229234000000	2.3175	5.080300e+10
2016-09-24	215639000000	2.0875	5.349700e+10

Step 3: Calculate Percentage change

We are actually not interested in the actual values. The only thing we care about is the percentage growth from year to year.

As the data is in reverse order, and we need to calculate from the beginning to the end. We first need to order the dates from old to newest.

Then it is straight forward to calculate using pct_change() method.

data = data.sort_index()

data['rev %-chg'] = data['revenue'].pct_change()
data['eps %-chg'] = data['eps'].pct_change()
data['fcf %-chg'] = data['fcf'].pct_change()

Now the data should look similar to this (or the tail of it).

	        revenue	        eps	fcf	        rev %-chg	      eps %-chg	fcf %-chg
date						
2016-09-24	215639000000	2.0875	5.349700e+10	-0.077342	-0.100216	-0.233326
2017-09-30	229234000000	2.3175	5.080300e+10	0.063045	0.110180	-0.050358
2018-09-29	265595000000	3.0025	6.412100e+10	0.158620	0.295577	0.262150
2019-09-28	260174000000	2.9925	5.889600e+10	-0.020411	-0.003331	-0.081487
2020-09-26	274515000000	3.3600	7.336500e+10	0.055121	0.122807	0.245670

Step 5: Preparing data used in each Excel sheet

The next step is to prepare the data for each sheet in the Excel document.

We only need the last 10 years of data and will use loc to get that. Then we create a DataFrame for each sheet.

data_set = data.loc['2010-01-01':]
data_set.index = data_set.index.year

rev = data_set[['revenue', 'rev %-chg']].copy()
eps = data_set[['eps', 'eps %-chg']].copy()
fcf = data_set[['fcf', 'fcf %-chg']].copy()

Notice that we set the index only to be the year.

Step 6: Creating the Excel sheet

To create the Excel sheet we will use Pandas with the XlsxWriter.

This can be done as follows.

def create_sheet(writer, df, sheet_name):
    df.to_excel(writer, sheet_name=sheet_name)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()

This will create a 3-sheet Excel document. One sheet for each DataFrame (rev, eps, fcf).

The function create_sheet(…) is practical, as each DataFrame is similar and we need to create the same sheet for each DataFrame.

The result will be in the Excel document financial.xlsx.

Step 7: Creating the charts

Now it pays to have the function create_sheet(…).

The full Excel document can be created by the following code.

def create_sheet(writer, df, sheet_name):
    df.to_excel(writer, sheet_name=sheet_name)
    worksheet = writer.sheets[sheet_name]
    workbook = writer.book
      
    chart = workbook.add_chart({'type': 'line'})
    
    chart.add_series({
        'name': sheet_name,
        'categories': [sheet_name, 1, 0, 11, 0],
        'values': [sheet_name, 1, 1, 11, 1],
    })
    
    column_chart = workbook.add_chart({'type': 'column'})
    
    column_chart.add_series({
        'name': sheet_name,
        'categories': [sheet_name, 1, 0, 11, 0],
        'values': [sheet_name, 1, 2, 11, 2],
        'y2_axis': True,
    })
    
    chart.combine(column_chart)
    chart.set_title({'name': sheet_name})
    chart.set_x_axis({'name': 'Date'})
    chart.set_y_axis({'name': 'Value'})
    column_chart.set_y2_axis({'name': '%-chg'})
    
    worksheet.insert_chart("E1", chart)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()

The details of creating the charts in Excel can be found on XlsxWriter. Basically, it is sending informations to the XlsxWriter engine through dictionaries. You need to send the values that you would set in Excel, if you were working inside there.

Again, the result will be in the Excel document financial.xlsx.

How To use Matplotlib Object Oriented with NumPy and Pandas

What will we cover in this tutorial?

If you like data visualization with NumPy and Pandas, then you must have encountered Matplotlib.

And if you also, like to program in an object oriented fashion, then most tutorial will make you feel wondering if no one loves the art of beautiful code?

Let me elaborate. The integration and interaction with Matplotlib is done in a functional way with a lot of side effects. Not nice.

Not sure what I talk about? We will cover that too.

Step 1: How NumPy is demonstrated to make plots with Matplotlib and what is wrong with it

Let’s make a simple example.

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 5, 11)
y = x ** 2
plt.plot(x, y)
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.title("Title")
plt.show()

This will result in the following chart.

That is nice and easy! So what is wrong with it?

Side effects!

What is a side effect in programming?

…that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation.

https://en.wikipedia.org/wiki/Side_effect_(computer_science)

What does that mean?

Well, let’s examine the above example.

We call plt.plt(x, y) and what happens? Actually we don’t know. We do not get anything in return.

Continue to call plt.xlabel(…), plt.ylabel(…), and plt.title(…). Then we call plt.show() to see the result. Hence, we change the state of the plt library we imported. See, we did not create an object. We call the library directly.

This is difficult as a programmer to understand without having deep knowledge of the library used.

So how to do it in more understandable way?

Step 2: How to create a chart with Matplotlib with NumPy in an object oriented way and why it is better

Let’s look at this code and examine it.

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 5, 11)
y = x ** 2

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()

Here we do it differently but get the same result. It is more understandable that when we call a method on object ax, that the state of ax is changing and not something in the library hidden in some side effect.

You can also show the the figure fig by calling show() and not the library. This requires that we add waitforbuttonpress() on plt, otherwise it will destroy the window immediately.

Note, that you do not have these challenges in JuPyter notebook – the plots are shown without the call to show.

You could keep the plt.show() instead of fig.show() and plt.waitforbuttonpress(). But the above code is more intuitive and easier to understand.

How to create a chart with Matplotlib of a Pandas DataFrame in an object oriented way

This is straight forward as Matplotlib is well integrated with Pandas.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = np.linspace(0, 5, 11)
y = x ** 2

df = pd.DataFrame(data=y, index=x)

fig, ax = plt.subplots()
ax.plot(df)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()

Notice, that the DataFrame is created from the NumPy arrays. Hence, here we do not gain anything from using it. This is just to exemplify how easy it is to use s in an object oriented way with Pandas.

Final thoughts

I have found that programmer either hate or love Matplotlib. I do not always know why, but I have discovered that this non-object oriented way of using Matplotlib is annoying some programmers.

This is a good reason to hate it, but I would say that there are no good alternative to Matplotlib – or at least, they are build upon Matplotlib.

I like the power and ease using Matplotlib. I do like that the option of using it object oriented, which makes the code more intuitive and easier to understand for other programmers.

How to Create Excel Report Automation with Python

What will we cover in this tutorial?

How to create excel report automation with Python. This will be done on the data similar to this.

The data we work with – 1000 lines like the above – get Excel sheet form link below.

And we will create Excel reports from Python like these ones.

The first sheet we will create in Excel from Python.
The second Excel sheet we create from Python.

Get direct Download link to JuPyter Notebook and Excel sheet used in YouTube video and follow along.

Step 1: Load the data in Pandas DataFrame

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas has great integration with Excel – both reading and writing Excel files. We will use that in this tutorial.

To read the data simply use the read_excel(…).

import pandas as pd

data = pd.read_excel("SalesData.xlsx", index_col="Date")
print(data.head())

Which will result in output similar to this.

	Sales rep	Item	Price	Quantity	Sale
Date					
2020-05-31	Mia	Markers	4	1	4
2020-02-01	Mia	Desk chair	199	2	398
2020-09-21	Oliver	Table	1099	2	2198
2020-07-15	Charlotte	Desk pad	9	2	18
2020-05-27	Emma	Book	12	1	12

Step 2: Group the data by Sales rep

To identify how much each Sales rep has sold, we will group it and calculate the sum.

repr = data.groupby("Sales rep")
repr_sales = repr['Sale'].sum()

Now that is too simple, right?

The data in rep_sales looks like this.

Sales rep
Charlotte     74599
Emma          65867
Ethan         40970
Liam          66989
Mia           88199
Noah          78575
Oliver        89355
Sophia       103480
William       80400
Name: Sale, dtype: int64

Step 3: Group the data by month, change to month names and sum it

This is a bit more challenging, as we need to group the dates in months and rename the index to month names.

Except if you know how to do it, as you will in a moment.

months = data.groupby(pd.Grouper(freq="M"))
months_sales = months['Sale'].sum()
months_sales.index = months_sales.index.month_name()

You are awesome. See, you use groupby on a Grouper with frequency month (“M”). To use the month name on index and not the last day of month, you reassign the index to use index.month_name().

This creates the following output where you can see the month names are used.

Date
January      69990
February     51847
March        67500
April        58401
May          40319
June         59397
July         64251
August       51571
September    55666
October      50093
November     57458
December     61941
Name: Sale, dtype: int64

Step 4: Create Excel sheet with the data from Step 2 and 3

This is done by creating an Excel writer.

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")
writer.close()

This will create an Excel sheet of name SalesReport.xlsx, with two sheets Sale per rep and Sale per month, respectively.

But let’s try to add the two charts.

Step 5: Adding the charts

This is a bit more involved. As you see, the above steps have been quite straight forward and not required any high level Python programming skills (or Excel, except you need to know how to open an Excel sheet).

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")

chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per rep\'!$B$2:$B$10',
    'categories': '=\'Sale per rep\'!$A$2:$A$10',
    'name': "Sale"
})
writer.sheets['Sale per rep'].insert_chart("D2", chart)

chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per month\'!$B$2:$B$13',
    'categories': '=\'Sale per month\'!$A$2:$A$13',
    'name': "Sale"
})
writer.sheets['Sale per month'].insert_chart("D2", chart)

writer.close()

This will create the sheets we have been waiting for. The charts are dynamic, meaning, that if you change the values the charts will change accordingly.

First sheet from our results
Second sheet from out results

Final remarks

Remember to see the YouTube video, which explains the material in more depth.

If you like the above content, please share it on a social media or point a friend in this direction.

How To Color Cells Dynamically in Excel Based of Formula done from Python

What will we cover in this tutorial?

We will learn how you can dynamically format the cells in Excel to have a specific color based on the value of a function – all done from Python.

As an example we will color the cells of two columns depending on which is larger.

If the value in MA50 is larger than value in MA200 it will color green and red otherwise.

In this concrete example we will collect historical stock prices and calculate the Moving Averages of 50 and 200 days on the closing price. In a crossover way to utilize moving averages, you would send a buy-signal when the 50-day moving average crosses above the 200-day moving average and a sell-signal if the other way around.

Step 1: Collect the data from Yahoo! Finance using Pandas-datareader

A great library to use to collect financial data is the Pandas-datareader.

We will collect data from Yahoo! Finance. As an example we will use General Motors historical stock data.

This can be done as follows.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

This will result in a Pandas DataFrame with data similar to this.

                 High        Low       Open      Close      Volume  Adj Close
Date                                                                         
2019-01-02  33.810001  32.410000  32.849998  33.639999   7662300.0  31.893360
2019-01-03  33.680000  32.200001  33.340000  32.250000  11618300.0  30.575533
2019-01-04  33.570000  32.430000  32.799999  33.330002  10784000.0  31.599455
2019-01-07  34.970001  33.480000  33.709999  34.360001  10722300.0  32.575977
2019-01-08  35.680000  34.490002  35.000000  34.810001  10532900.0  33.002617

Step 2: Calculate the Moving Averages

The moving average can be calculated by applying the rolling() and mean() methods on the DataFrame.

The rolling() creates a window size which is the number of observations used for calculating the statistic.

To calculate the Moving Average of window size 50 will use the last 50 data points to calculate the mean. To be consistent with Yahoo! Finance way of calculate the value, they use the Close price.

import pandas_datareader as pdr
import datetime as dt
import xlsxwriter
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

Which will give a tail similar to this output.

                 High        Low       Open  ...  Adj Close     MA50     MA200
Date                                         ...                              
2020-12-14  42.540001  41.419998  42.490002  ...  41.619999  38.5632  29.03155
2020-12-15  42.160000  41.310001  41.990002  ...  41.660000  38.7772  29.08725
2020-12-16  41.869999  40.810001  41.790001  ...  41.419998  38.9976  29.13670
2020-12-17  42.029999  41.430000  41.709999  ...  42.029999  39.2058  29.19635
2020-12-18  42.042801  41.139999  42.020000  ...  41.389999  39.3894  29.25985

Where the Moving Averages are added to the DataFrame.

Step 3: Exporting to Excel using XlsxWriter

We will use the powerful XlsxWriter to create our Excel sheet.

You can write the DataFrame directly by using to_excel(…), but we want to do more than that. Hence, we use the XlsxWriter directly.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

# We only want the index as a date and not datetime object
ticker.index = ticker.index.date

# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')

# Remember to close the writer to write the Excel sheet
writer.close()

This will create an Excel sheet similar to this one.

Step 4: Adding formatting to the cells

This is a bit more complex as we need to do the magic here.

We first need to create a reference to the sheet (worksheet) we work on to access the functionality.

The first thing we do is to resize column A, such that it opens in the correct size to see the date. This is done by using set_column(…).

Then we create a format, which can be applied on Excel cells. This is how we color them. Hence, we create one for green cells and later one for red cells.

We insert them by using conditional_format(…), which a range of cells. This can be applied in different ways, we use the A1 notation.

The options parameter in conditional_format(…) must be a dictionary containing the parameters that describe the type and style of the conditional format. To see details of the format please refer to the manual.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

# We only want the index as a date and not datetime object
ticker.index = ticker.index.date

# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')

# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Moving Averages']
# Resize the column A
worksheet.set_column("A:A", 12)

# Calculate the last row number (we insert first DataFrame row in row 2)
last_row = len(ticker) + 1

# Create a format for a green cell
cell_format_green = workbook.add_format()
cell_format_green.set_bg_color('green')

# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2>=$I2',
                                                         'format':   cell_format_green})

# Create a format for a red cell
cell_format_red = workbook.add_format()
cell_format_red.set_bg_color('red')

# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2<$I2',
                                                         'format':   cell_format_red})

# Remember to close the writer
writer.close()

This will result in the expected Excel sheet.

Step 5: Validating it works as expected

Now for the final test.

The colors should be updated dynamically. Let’s try to change the value and see what happens.

It does. If we change the value of H2 to 10, it turns the cells to red.

How to Create Awesome Mosaic Picture in Excel with Python

What will we do in this tutorial?

Create a mosaic in Excel using Python.

See the tutorial on YouTube and hear how you can use it as prank!

Step 1: How to create a mosaic from a photo

If you want a deeper description of how to create a mosaic you should read the following tutorial, which shows the code on how to do it.

Step 2: Install the necessary libraries

You need to install OpenCV. If you use PyCharm you can follow this tutorial.

Otherwise you can install the libraries as follows.

pip install opencv-python
pip install numpy
pip install xlsxwriter

Please ask if you have troubles with it.

Step 3: The code used to create the Mosaic in Excel

This will just provide the code for you to enjoy. The process code is used from the tutorial linked above, where it is described.

import cv2
import numpy as np
import xlsxwriter


def create_mosaic_in_excel(photo, box_height, box_width, col_width=2, row_height=15):
    # Get the height and width of the photo
    height, width, _ = photo.shape

    # Create Excel workbook and worksheet
    workbook = xlsxwriter.Workbook('mosaic.xlsx')
    worksheet = workbook.add_worksheet("Urgent")

    # Resize columns and rows
    worksheet.set_column(0, width//box_width - 1, col_width)
    for i in range(height//box_height):
        worksheet.set_row(i, row_height)

    # Create mosaic
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            # Create region of interest (ROI)
            roi = photo[i:i + box_height, j:j + box_width]
            # Use numpy to calculate mean in ROI of color channels
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])

            # Convert mean to int
            b_mean_int = b_mean.astype(int).item()
            g_mean_int = g_mean.astype(int).item()
            r_mean_int = r_mean.astype(int).item()

            # Create color code
            color = '#{:02x}{:02x}{:02x}'.format(r_mean_int, g_mean_int, b_mean_int)

            # Add color code to cell
            cell_format = workbook.add_format()
            cell_format.set_bg_color(color)
            worksheet.write(i//box_height, j//box_width, "", cell_format)

    # Close and write the Excel sheet
    workbook.close()


def main():
    photo = cv2.imread("rune.png")

    number_cols = 50
    number_rows = 45

    # Get height and width of photo
    height, width, _ = photo.shape
    box_width = width // number_cols
    box_height = height // number_rows

    # To make sure that it we can slice the photo in box-sizes
    width = (width // box_width) * box_width
    height = (height // box_height) * box_height
    photo = cv2.resize(photo, (width, height))

    # Create the Excel mosaic
    create_mosaic_in_excel(photo.copy(), box_height, box_width, col_width=2, row_height=15)


main()

Step 4: What to modify

The above tutorial assumes a photo of me in rune.png. I used the one taken from this page. You should obviously change it to something else.

You can change how many columns and rows in the Excel sheet the mosaic should be. This is done by changing the values of number_cols and number_rows.

Then you can change the values of col_width=2 and row_height=15.

In the YouTube video I use this free picture from Pexels (download) and modify number_cols = 100 and number_rows = 90, and col_width=1 and row_height=6.