## What will we cover in this tutorial?

In this tutorial we will cover how to calculate the Simple Moving Average (MA) and the Exponential Moving Average (EMA) of a Time Series using the Pandas library in Python.

## Step 1: Read some Financial Historic Time Series Stock Prices

We will use Pandas Datareader to read some historic stock prices. See this tutorial for details.

```import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

```

Resulting in this.

```                 High        Low       Open      Close       Volume  Adj Close
Date
2019-01-02  39.712502  38.557499  38.722500  39.480000  148158800.0  38.505024
2019-01-03  36.430000  35.500000  35.994999  35.547501  365248800.0  34.669640
2019-01-04  37.137501  35.950001  36.132500  37.064999  234428400.0  36.149662
2019-01-07  37.207500  36.474998  37.174999  36.982498  219111200.0  36.069202
2019-01-08  37.955002  37.130001  37.389999  37.687500  164101200.0  36.756794
```

## Step 2: Calculate the Simple Moving Average with Python and Pandas

To calculate the Simple Moving Average (MA) of the data can be done using the rolling and mean methods.

```data['MA10'] = data['Close'].rolling(10).mean()
```

Where here we calculate the Simple Moving Average of 10 days. You can change it to fit your needs.

## Step 3: Calculate the Exponential Moving Average with Python and Pandas

It is a bit more involved to calculate the Exponential Moving Average.

```data['EMA10'] = data['Close'].ewm(span=10, adjust=False).mean()
```

There you need to set the span and adjust to False. This is needed to get the same numbers as on Yahoo! Finance.

## Next steps?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

## What will we cover in this tutorial?

In this tutorial we will cover the following.

• How to use Pandas Datareader to read historical stock prices from Yahoo! Finance.
• Learn how to read weekly and monthly data.
• Also how to read multiple tickers at once.

## Step 1: What is Pandas Datareader?

Pandas-Datareader is an up to date remote data access for pandas.

This leads to the next question. What is pandas?

Pandas is a data analysis and manipulation tool containing a great data structure for the purpose.

Shortly said, pandas can be thought of as a data structure in Python, which is similar to working with data in a spreadsheet.

Pandas-datareader reads data from various sources and puts the data into a pandas data structures.

Pandas-datareader has a call to return historic stock price data from Yahoo! Finance.

To use Pandas-datareader you need to import the library.

Let’s break the following example down.

```import pandas_datareader as pdr
import datetime as dt

ticker = "AAPL"
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data)
```

Where we first import two libraries.

• pandas_datareader The Pandas Datareader. If you do not have it installed already in your Jupyter Notebook you can do that by entering this in a cell !pip install pandas_datareader and execute it.
• datetime This is a default library and represents a date and time. We only use it for the date aspects.

The the following lines.

• ticker = “AAPL” The ticker we want data from. You can use any ticker you want. In this course we have used the ticker for Apple (AAPL).
• start = dt.datetime(2019, 1, 1) Is the starting day we want historic stock price data.
• end = dt.datetime(2020, 12, 31) The end day.
• data = pdr.get_data_yahoo(ticker, start, end) This is the magic that uses Pandas Datareader (pdr) to get data from the Yahoo! Finance API. It returns a DataFrame as we know it from previous lessons.

The output of the code is as follows.

```                  High         Low  ...       Volume   Adj Close
Date                                ...
2019-01-02   39.712502   38.557499  ...  148158800.0   38.505024
2019-01-03   36.430000   35.500000  ...  365248800.0   34.669640
2019-01-04   37.137501   35.950001  ...  234428400.0   36.149662
2019-01-07   37.207500   36.474998  ...  219111200.0   36.069202
2019-01-08   37.955002   37.130001  ...  164101200.0   36.756794
...                ...         ...  ...          ...         ...
2020-12-24  133.460007  131.100006  ...   54930100.0  131.773087
2020-12-28  137.339996  133.509995  ...  124486200.0  136.486053
2020-12-29  138.789993  134.339996  ...  121047300.0  134.668762
2020-12-30  135.990005  133.399994  ...   96452100.0  133.520477
2020-12-31  134.740005  131.720001  ...   99116600.0  132.492020

[505 rows x 6 columns]
```

## Step 3: A few parameters to set

You can get multiple tickers at once by parsing a list of them.

```import pandas_datareader as pdr
import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end)

print(data)
```

You can get the weekly or monthly data by using the argument as follows.

```import datetime as dt

ticker = ["AAPL", "IBM", "TSLA"]
start = dt.datetime(2019, 1, 1)
end = dt.datetime(2020, 12, 31)

data = pdr.get_data_yahoo(ticker, start, end, interval='w')

print(data)
```

Set interval=’m’ to get monthly data instead of weekly with ‘w’.

## Next steps?

This is part of the FREE online course on my page. No signup required and 2 hours of free video content with code and Jupyter Notebooks available on GitHub.

## What will we cover in this tutorial?

We will learn how to visualization time series data in a DataFrame with Matplotlib.

This tutorial will show you.

• How to use Matplotlib with DataFrames.
• Use Matplotlib with subplots (the object-oriented way).
• How to make multiple plots in one figure.
• How to create bar-plots

## Want to access the code directly in Jupyter Notebook?

You can get the Jupyter Notebooks from the GitHub here, where there are also direct links to Colab for an interactive experience.

## Step 1: Read time series data into a DataFrame

A DataFrame is a two-dimensional tabular data. It is the primary data structure of Pandas. The data structure contains labeled axes (rows and columns).

To get access to a DataFrame data structure, you need to import the Pandas library.

```import pandas as pd
```

Then we need some time series data. You con download your own CSV file from financial pages like Yahoo! Finance.

For this tutorial we will use a dataset available from the GitHub.

```remote_file = "https://raw.githubusercontent.com/LearnPythonWithRune/FinancialDataAnalysisWithPython/main/AAPL.csv"
```

The pd.read_csv(…) does all the magic. We set the index_col=0, which sets the first column of the CSV data file to be the index. This is the dates.

Then we set parse_dates=True, to ensure that dates are actually parsed as dates and not as strings. This is necessary to take advantage of being time series and index with time intervals.

## Step 2: Import Matplotlib in Jupyter Notebook

When you import Matplotlib in Jupyter Notebook, you need to set a rendering mode.

```import matplotlib.pyplot as plt
%matplotlib notebook
```

We will use the notebook mode, which is interactive. This enables you to zoom in on interval, move around, and save the figure.

It is common to use inline mode for rendering in Jupyter Notebook. The inline mode creates a static image, which is not interactive.

## Step 3: Use Matplotlib the Object-Oriente way

Matplotlib can be used in a functional way and an object-oriented way. Most use it in a functional way, which often creates more confusion, as it is not always intuitive how it works.

The object-oriented way leads to less confusion for the cost of one extra line of code and parsing one argument. Hence, the price is low for the gain.

```fig, ax = plt.subplots()
data['Close'].plot(ax=ax)
ax.set_ylabel("Price")
ax.set_title("AAPL")
```

The first line returns a figure and axis (fig and ax). The figure is where we put the axis, and the axis is the chart.

The actually plot is made by calling the DataFrame, actually, we access the column Close in this case, which is the Series of the time series of the historic Close prices.

Confused? Don’t worry about the details.

Notice, that we parse ax=ax to the plot. This ensures that we render the chart on the returned axis ax.

Finally, we add a y-label and a title to our axis.

## Step 4: Creating multiple charts in one Matplotlib figure

How can we create multiple charts (or axes) in one Matplotlib figure?

Luckily, this is quite easy.

```fig, ax = plt.subplots(2, 2)
data['Open'].plot(ax=ax[0, 0], title="Open")
data['High'].plot(ax=ax[0, 1], title="High")
data['Low'].plot(ax=ax[1, 0], title="Low")
data['Close'].plot(ax=ax[1, 1], title="Close")
plt.tight_layout()
```

Here we see a few differences. First, notice plt.subplots(2, 2), which will return a figure fig, and a list of lists with 2-by-2 axes. Hence, ax is a two dimensional list of axes.

We can access the first axis with ax[0, 0,], and parse it as an argument to plot.

This continues for all the 4 plots we make, as you see.

Finally, we use plt.tight_layout(), which will ensures that the layout of the axes does not overlap. You can try without to see the difference.

## Step 5: Create a bar-chart with Matplotlib

Finally, we will make a bar-chart with Matplotlib.

Actually, we will render a horizontal bar-chart.

```fig, ax = plt.subplots()
data['Volume'].loc['2020-07-01':'2020-08-15'].plot.barh(ax=ax)
```

We do it for the volume and only on a limited interval of time. This shows you how to take advantage of the time series aspect of the DataFrame.

## Next step

The above is part of the FREE 2h Video course.

## What will we cover?

In this tutorial we will cover.

• What is a backtesting strategy?
• How to measure the performance of a backtesting strategy?
• How to implement a backtesting strategy with Pandas?

## What is a backtesting strategy?

In a trading strategy backtesting seeks to estimate the performance of a strategy or model if it had been employed during a past period (source).

The way to analyze the performance of a strategy is to compare it with return, volatility, and max drawdown. Other metrics can also be used, but for this tutorial we will use these.

## Step 1: Read data from Yahoo! Finance API with Pandas Datareader

Let’s get started by importing a few libraries and retrieve some data from Yahoo! Finance API with Pandas Datareader.

```import pandas as pd
import datetime as dt
import numpy as np

start = dt.datetime(2010, 1, 1)

data = pdr.get_data_yahoo("AAPL", start)
```

Which will read data for the Apple ticker (AAPL) since 2010.

Below is shown the head of data.

```                High       Low      Open     Close       Volume  Adj Close
Date
2010-01-04  7.660714  7.585000  7.622500  7.643214  493729600.0   6.583586
2010-01-05  7.699643  7.616071  7.664286  7.656429  601904800.0   6.594968
2010-01-06  7.686786  7.526786  7.656429  7.534643  552160000.0   6.490066
2010-01-07  7.571429  7.466071  7.562500  7.520714  477131200.0   6.478067
2010-01-08  7.571429  7.466429  7.510714  7.570714  447610800.0   6.521136
```

## Step 2: Calculate signals for a simple strategy

The simple strategy we will use is moving average of period 5 and 20.

When the moving average of the Adj Close price of 5 days is above the moving average of 20 days, we go long (buy and hold) otherwise short (sell).

This can be calculated as follows.

```data['Signal'] = data['Adj Close'].rolling(5).mean() - data['Adj Close'].rolling(20).mean()
data['Position'] = (data['Signal'].apply(np.sign) + 1)/2
```

This results in a Signal line, which is the differences of the two moving averages. When the signal line is positive our position is 1 (buy and hold) otherwise 0 (sell).

```                  High         Low        Open  ...   Adj Close    Signal  Position
Date                                            ...
2021-02-26  124.849998  121.199997  122.589996  ...  121.260002 -7.610835       0.0
2021-03-01  127.930000  122.790001  123.750000  ...  127.790001 -7.054179       0.0
2021-03-02  128.720001  125.010002  128.410004  ...  125.120003 -6.761187       0.0
2021-03-03  125.709999  121.839996  124.809998  ...  122.059998 -6.782757       0.0
2021-03-04  123.599998  118.620003  121.750000  ...  120.129997 -6.274249       0.0
```

The reason why we want long to 1 and short to be 0 is for computational reasons, which will be clear soon.

## Step 3: Remove unnecessary data columns and rows

To have a cleaner dataset we will clean it up.

```data.drop(['High', 'Low', 'Open', 'Volume', 'Close'], axis=1, inplace=True)
data.dropna(inplace=True)
```

Where drop removes columns not needed and dropna removes rows with NaN.

The inplace=True is simply to apply it on the DataFrame.

```            Adj Close    Signal  Position
Date
2010-02-01   5.990476 -0.217986       0.0
2010-02-02   6.025239 -0.252087       0.0
2010-02-03   6.128909 -0.282004       0.0
2010-02-04   5.908031 -0.297447       0.0
2010-02-05   6.012933 -0.253271       0.0
```

## Step 4: Calculate the return of the strategy

To calculate the return we will use log returns as we will see is an advantage.

Then we use the Position, but we shift it by 1, as we assume we first react on a position the day after the signal.

```data['Log return'] = np.log(data['Adj Close']/data['Adj Close'].shift())
data['Return'] = data['Position'].shift(1)*data['Log return']
```

This result in the following.

```             Adj Close    Signal  Position  Log return  Return
Date
2021-02-26  121.260002 -7.610835       0.0    0.002229     0.0
2021-03-01  127.790001 -7.054179       0.0    0.052451     0.0
2021-03-02  125.120003 -6.761187       0.0   -0.021115    -0.0
2021-03-03  122.059998 -6.782757       0.0   -0.024761    -0.0
2021-03-04  120.129997 -6.274249       0.0   -0.015938    -0.0
```

Remember that that we can add up log returns to calculate the final return. For details I refer to this.

Hence, we get that the return can be calculated as follows.

```data[['Log return', 'Return']].cumsum().apply(np.exp)
```

Resulting in the following.

```            Log return   Return
Date
2021-02-26   20.242133  7.29214
2021-03-01   21.332196  7.29214
2021-03-02   20.886489  7.29214
2021-03-03   20.375677  7.29214
2021-03-04   20.053499  7.29214
```

Using a bit calculations.

```np.exp(data[['Log return', 'Return']].mean()*252)
```

We get.

```Log return    1.310917
Return        1.196485
dtype: float64
```

Which tells us that the annualized return of our strategy giver 19.6485% return. A buy and hold strategy would give 31.0917%

The natural question is: What did we gain with our strategy?

## Step 5: Evaluating our strategy

If we compute the volatility comparing the buy-and-hold strategy with ours. The volatility of a stock can be calculated in many ways. Here we will use the standard deviation. For other measures refer to Investpedia.

```data[['Log return', 'Return']].std()*252**.5
```

Which gives the annualized standard deviation.

```Log return    0.283467
Return        0.188044
dtype: float64
```

Hence, the gain from our strategy is a less volatile strategy.

## What will we cover in this tutorial?

In this tutorial we will cover the following

## Step 1: Read historic stock prices from Yahoo! Finance API

To read data from Yahoo! Finance API we use Pandas-Datareader, which has a direct method.

This requires that we give a start date on how old data we want to retrieve.

```import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

print(data.tail())
```

This we result in similar output.

```	High	Low	Open	Close	Volume	Adj Close
Date
2021-02-12	561.250000	550.849976	556.940002	556.520020	2195900	556.520020
2021-02-16	563.630005	552.729980	557.289978	557.280029	2622400	557.280029
2021-02-17	555.250000	543.030029	550.989990	551.340027	2069600	551.340027
2021-02-18	550.000000	538.229980	549.000000	548.219971	2456200	548.219971
2021-02-19	548.989990	538.809998	548.000000	540.219971	2838600	540.219971
```

## Calculate the Average True Range (ATR)

The Average True Range (ATR) is calculated as follows, as investopedia.org defines it.

This can be calculated as follows.

```import numpy as np
import datetime as dt

start = dt.datetime(2020, 1, 1)
data = pdr.get_data_yahoo("NFLX", start)

high_low = data['High'] - data['Low']
high_close = np.abs(data['High'] - data['Close'].shift())
low_close = np.abs(data['Low'] - data['Close'].shift())

ranges = pd.concat([high_low, high_close, low_close], axis=1)
true_range = np.max(ranges, axis=1)

atr = true_range.rolling(14).sum()/14
```

Where we use the 14 days standard.

## Visualize the ATR and the stock price

We will use Matplotlib to visualize it as it integrates well with DataFrames from Pandas.

```import matplotlib.pyplot as plt

fig, ax = plt.subplots()
atr.plot(ax=ax)
data['Close'].plot(ax=ax, secondary_y=True, alpha=0.3)
plt.show()
```

This will result in a chart similar to this one.

## What will we cover in this tutorial?

We will retrieve the historic stock prices and calculate the moving average. Then we will export the data to Excel and insert a chart, but all done from Python.

See the in depth explanation in the YouTube video. It also gives advice on how to interpret the Simple Moving Averages (SMA).

## Step 1: Read historic stock prices

We will use the Pandas-datarader to get the historic prices of NFLX (the ticker for Netflix).

```import pandas_datareader as pdr
import datetime as dt

ticker = "NFLX"
start = dt.datetime(2019, 1, 1)

data = pdr.get_data_yahoo(ticker, start)
```

And you will get the historic data for Netflix from January 1st, 2019.

```	High	Low	Open	Close	Volume	Adj Close
Date
2019-01-02	269.750000	256.579987	259.279999	267.660004	11679500	267.660004
2019-01-03	275.790009	264.429993	270.200012	271.200012	14969600	271.200012
2019-01-04	297.799988	278.540009	281.880005	297.570007	19330100	297.570007
2019-01-07	316.799988	301.649994	302.100006	315.339996	18620100	315.339996
2019-01-08	320.589996	308.010010	319.980011	320.269989	15359200	320.269989
```

## Step 2: Understand Moving Average

We will calculate the Simple Moving Average as defined on Investopedia.

The Simple Moving Average (Now just referred to as Moving Average or MA) is defined by a period of days.

That is, the MA of a period of 10 (MA10) will take the average value of the last 10 close prices. This is done in a rolling way, hence, we will get a MA10 for every trading day in our historic data, except the first 9 days in our dataset.

We can similarly calculate a MA50 and MA200, which is a Moving Average of the last 50 and 200 days, respectively.

## Step 3: Calculating the Moving Averages

We can do that by using rolling and mean.

And it is magic.

```data['MA10'] = data['Close'].rolling(10).mean()
data['MA50'] = data['Close'].rolling(50).mean()
data['MA200'] = data['Close'].rolling(200).mean()

print(data.tail())
```

That was easy, right?

```	High	Low	Open	Close	Volume	Adj Close	MA10	MA50	MA200
Date
2021-01-12	501.089996	485.670013	500.000000	494.250000	5990400	494.250000	515.297998	502.918599	477.08175
2021-01-13	512.349976	493.010010	495.500000	507.790009	5032100	507.790009	512.989999	503.559600	477.76590
2021-01-14	514.500000	499.579987	507.350006	500.859985	4177400	500.859985	510.616995	503.894399	478.39270
2021-01-15	506.320007	495.100006	500.000000	497.980011	5890200	497.980011	506.341998	504.109600	479.06220
2021-01-19	509.250000	493.540009	501.000000	501.769989	11996900	501.769989	504.232999	504.205999	479.72065
```

## Step 4: Visualize it with Matplotlib

We can see the data with Matplotlib.

```import matplotlib.pyplot as plt

data[['Close', 'MA10', 'MA50']].loc['2020-01-01':].plot()
plt.show()
```

Resulting in the following plot.

Where you can see how the MA10 and MA50 move according to the price.

## Step 5: Export to Excel

Now we will export the data to Excel.

For this we need to import Pandas and use the XlsxWriter engine, where you can find the details of the code.

The code can be found here.

```import pandas as pd

data = data.loc['2020-01-01':]
data = data.iloc[::-1]
writer = pd.ExcelWriter("technical.xlsx",
engine='xlsxwriter',
date_format = 'yyyy-mm-dd',
datetime_format='yyyy-mm-dd')

sheet_name = 'Moving Average'
data[['Close', 'MA10', 'MA50']].to_excel(writer, sheet_name=sheet_name)

worksheet = writer.sheets[sheet_name]
workbook = writer.book

# Create a format for a green cell
'bg_color': '#C6EFCE',
'font_color': '#006100'
})

# Create a format for a red cell
'bg_color': '#FFC7CE',
'font_color': '#9C0006'
})

# Set column width of Date
worksheet.set_column(0, 0, 15)

for col in range(1, 4):
# Create a conditional formatted of type formula
worksheet.conditional_format(1, col, len(data), col, {
'type': 'formula',
'criteria': '=C2>=D2',
'format': green_cell
})

# Create a conditional formatted of type formula
worksheet.conditional_format(1, col, len(data), col, {
'type': 'formula',
'criteria': '=C2<D2',
'format': red_cell
})

# Create a new chart object.

# Add a series to the chart.
'name': "MA10",
'categories': [sheet_name, 1, 0, len(data), 0],
'values': [sheet_name, 1, 2, len(data), 2],
})

# Create a new chart object.

# Add a series to the chart.
'name': 'MA50',
'categories': [sheet_name, 1, 0, len(data), 0],
'values': [sheet_name, 1, 3, len(data), 3],
})

# Combine and insert title, axis names
chart1.combine(chart2)
chart1.set_title({'name': sheet_name + " " + ticker})
chart1.set_x_axis({'name': 'Date'})
chart1.set_y_axis({'name': 'Price'})

# Insert the chart into the worksheet.
worksheet.insert_chart('F2', chart1)

writer.close()
```

Where the output will be something similar to this.

## What will we cover in this tutorial?

In this tutorial we will show how to visualize time series with Matplotlib. We will do that using Jupyter notebook and you can download the resources (the notebook and data used) from here.

## Step 1: What is a time series?

The easiest way to understand it, is to show it. If you downloaded the resources and started the Jupyter notebook execute the following lines.

```import pandas as pd

```

This will produce the following output.

```	High	Low	Open	Close	Volume	Adj Close
Date
2020-01-02	86.139999	84.342003	84.900002	86.052002	47660500.0	86.052002
2020-01-03	90.800003	87.384003	88.099998	88.601997	88892500.0	88.601997
2020-01-06	90.311996	88.000000	88.094002	90.307999	50665000.0	90.307999
2020-01-07	94.325996	90.671997	92.279999	93.811996	89410500.0	93.811996
2020-01-08	99.697998	93.646004	94.739998	98.428001	155721500.0	98.428001
```

You notice the the far left column is called Date and that is the index. This index has a time value, in this case, a date.

Time series data is data “stamped” by a time. In this case, it is time indexed by dates.

The data you see is historic stock prices.

## Step 2: How to visualize data with Matplotlib

The above data is kept in a DataFrame (Pandas data object), this makes it straight forward to visualize it.

```import matplotlib.pyplot as plt
%matplotlib notebook

data.plot()
```

Which will result in a chart similar to this one.

This is not impressive. It seems like something is wrong.

Actually, there is not. It just does what you ask for. It plots all the 6 columns all together in one chart. Because the Volume is such a high number, all the other columns are in the same brown line (the one that looks straight).

## Step 3: Matplotlib has a functional and object oriented interface

This is often a bit confusing at first.

But Matplotlib has a functional and object oriented interface. We used the functional.

If you try to execute the following in your Jupyter notebook.

```data['My col'] = data['Volume']*0.5
data['My col'].plot()
```

It would seem like nothing happened.

But then investigate your previous plot.

It got updated with a new line. Hence, instead of creating a new chart (or figure) it just added it to the existing one.

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.

## Step 4: How to make a new figure

What to do?

Well, you need to use the object oriented interface of Matplotlib.

You can do that as follows.

```fig1, ax1 = plt.subplots()
data['My col'].plot(ax=ax1)
```

Which will produce what you are looking for. A new figure.

## Step 5: Make multiple plots in one figure

This is getting fun.

How can you create multiple plots in one figure?

On creating you actually do that.

```fig2, ax2 = plt.subplots(2, 2)

data['Open'].plot(ax=ax2[0, 0])
data['High'].plot(ax=ax2[0, 1])
data['Low'].plot(ax=ax2[1, 0])
data['Close'].plot(ax=ax2[1, 1])
plt.tight_layout()
```

Notice that subplots(2, 2) creates a 2 times 2 array of axis you can use to create a plot.

This should result in this chart.

## Step 6: Make a histogram

This can be done as follows.

```fig3, ax3 = plt.subplots()

data.loc[:'2020-01-31', 'Volume'].plot.bar(ax=ax3)
```

Notice that we only take the first month of the Volume data here (data.loc[:’2020-01-31′, ‘Volume’]).

This should result in this figure.

## Step 7: Save the figures

This is straight forward.

```fig1.savefig("figure-1.png")
fig2.savefig("figure-2.png")
fig3.savefig("figure-3.png")
```

And the above figures should be available in the same location you are running your Jupyter notebook.

## Next step

If you want to learn more about functional and object oriented way of using Matplotlib we recommend this tutorial.

## What will we cover in this tutorial?

We will get financial data from FPM cloud (Free stock API) for the last years and generate a 3 sheet Excel sheet with charts similar to this. All done from Python.

## Step 1: Collect the data

In this tutorial we are only going to use the example data on Apple, that is free available without registering on FMP Cloud. If you want to do it on another stock yo will need to register on their site.

What we need is the income statement and cash flow statement. They are available as JSON on their page (income statement and cash flow statement).

As our main library we will use Pandas.

The data can be collected with the following code (NOTICE: The code syntax highlighter changes & to &amp; in the URLs => remove the amp;).

```import pandas as pd

income_statement.set_index('date', inplace=True)
cash_flow.set_index('date', inplace=True)
```

Notice that we set the index to be the date column. This makes the further work easier.

## Step 2: Prepare the data

The next step we need to do is to make the necessary calculations and prepare the data.

We are only interested Revenue, Earnings Per Share (EPS) and Free Cash Flow (FCF). So let’s take that data and keep it in a DataFrame (The Pandas main data structure).

```data = income_statement[['revenue', 'eps']].copy()
data['fcf'] = cash_flow['freeCashFlow']
```

Now the data should look something like this.

```	        revenue	        eps	fcf
date
2020-09-26	274515000000	3.3600	7.336500e+10
2019-09-28	260174000000	2.9925	5.889600e+10
2018-09-29	265595000000	3.0025	6.412100e+10
2017-09-30	229234000000	2.3175	5.080300e+10
2016-09-24	215639000000	2.0875	5.349700e+10

```

## Step 3: Calculate Percentage change

We are actually not interested in the actual values. The only thing we care about is the percentage growth from year to year.

As the data is in reverse order, and we need to calculate from the beginning to the end. We first need to order the dates from old to newest.

Then it is straight forward to calculate using pct_change() method.

```data = data.sort_index()

data['rev %-chg'] = data['revenue'].pct_change()
data['eps %-chg'] = data['eps'].pct_change()
data['fcf %-chg'] = data['fcf'].pct_change()
```

Now the data should look similar to this (or the tail of it).

```	        revenue	        eps	fcf	        rev %-chg	      eps %-chg	fcf %-chg
date
2016-09-24	215639000000	2.0875	5.349700e+10	-0.077342	-0.100216	-0.233326
2017-09-30	229234000000	2.3175	5.080300e+10	0.063045	0.110180	-0.050358
2018-09-29	265595000000	3.0025	6.412100e+10	0.158620	0.295577	0.262150
2019-09-28	260174000000	2.9925	5.889600e+10	-0.020411	-0.003331	-0.081487
2020-09-26	274515000000	3.3600	7.336500e+10	0.055121	0.122807	0.245670
```

## Step 5: Preparing data used in each Excel sheet

The next step is to prepare the data for each sheet in the Excel document.

We only need the last 10 years of data and will use loc to get that. Then we create a DataFrame for each sheet.

```data_set = data.loc['2010-01-01':]
data_set.index = data_set.index.year

rev = data_set[['revenue', 'rev %-chg']].copy()
eps = data_set[['eps', 'eps %-chg']].copy()
fcf = data_set[['fcf', 'fcf %-chg']].copy()
```

Notice that we set the index only to be the year.

## Step 6: Creating the Excel sheet

To create the Excel sheet we will use Pandas with the XlsxWriter.

This can be done as follows.

```def create_sheet(writer, df, sheet_name):
df.to_excel(writer, sheet_name=sheet_name)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()
```

This will create a 3-sheet Excel document. One sheet for each DataFrame (rev, eps, fcf).

The function create_sheet(…) is practical, as each DataFrame is similar and we need to create the same sheet for each DataFrame.

The result will be in the Excel document financial.xlsx.

## Step 7: Creating the charts

Now it pays to have the function create_sheet(…).

The full Excel document can be created by the following code.

```def create_sheet(writer, df, sheet_name):
df.to_excel(writer, sheet_name=sheet_name)
worksheet = writer.sheets[sheet_name]
workbook = writer.book

'name': sheet_name,
'categories': [sheet_name, 1, 0, 11, 0],
'values': [sheet_name, 1, 1, 11, 1],
})

'name': sheet_name,
'categories': [sheet_name, 1, 0, 11, 0],
'values': [sheet_name, 1, 2, 11, 2],
'y2_axis': True,
})

chart.combine(column_chart)
chart.set_title({'name': sheet_name})
chart.set_x_axis({'name': 'Date'})
chart.set_y_axis({'name': 'Value'})
column_chart.set_y2_axis({'name': '%-chg'})

worksheet.insert_chart("E1", chart)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()
```

The details of creating the charts in Excel can be found on XlsxWriter. Basically, it is sending informations to the XlsxWriter engine through dictionaries. You need to send the values that you would set in Excel, if you were working inside there.

Again, the result will be in the Excel document financial.xlsx.

## What will we cover in this tutorial?

If you like data visualization with NumPy and Pandas, then you must have encountered Matplotlib.

And if you also, like to program in an object oriented fashion, then most tutorial will make you feel wondering if no one loves the art of beautiful code?

Let me elaborate. The integration and interaction with Matplotlib is done in a functional way with a lot of side effects. Not nice.

Not sure what I talk about? We will cover that too.

## Step 1: How NumPy is demonstrated to make plots with Matplotlib and what is wrong with it

Let’s make a simple example.

```import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 5, 11)
y = x ** 2
plt.plot(x, y)
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.title("Title")
plt.show()
```

This will result in the following chart.

That is nice and easy! So what is wrong with it?

Side effects!

What is a side effect in programming?

…that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation.

https://en.wikipedia.org/wiki/Side_effect_(computer_science)

What does that mean?

Well, let’s examine the above example.

We call plt.plt(x, y) and what happens? Actually we don’t know. We do not get anything in return.

Continue to call plt.xlabel(…), plt.ylabel(…), and plt.title(…). Then we call plt.show() to see the result. Hence, we change the state of the plt library we imported. See, we did not create an object. We call the library directly.

This is difficult as a programmer to understand without having deep knowledge of the library used.

So how to do it in more understandable way?

## Step 2: How to create a chart with Matplotlib with NumPy in an object oriented way and why it is better

Let’s look at this code and examine it.

```import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 5, 11)
y = x ** 2

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()
```

Here we do it differently but get the same result. It is more understandable that when we call a method on object ax, that the state of ax is changing and not something in the library hidden in some side effect.

You can also show the the figure fig by calling show() and not the library. This requires that we add waitforbuttonpress() on plt, otherwise it will destroy the window immediately.

Note, that you do not have these challenges in JuPyter notebook – the plots are shown without the call to show.

You could keep the plt.show() instead of fig.show() and plt.waitforbuttonpress(). But the above code is more intuitive and easier to understand.

## How to create a chart with Matplotlib of a Pandas DataFrame in an object oriented way

This is straight forward as Matplotlib is well integrated with Pandas.

```import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = np.linspace(0, 5, 11)
y = x ** 2

df = pd.DataFrame(data=y, index=x)

fig, ax = plt.subplots()
ax.plot(df)
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
fig.show()
plt.waitforbuttonpress()
```

Notice, that the DataFrame is created from the NumPy arrays. Hence, here we do not gain anything from using it. This is just to exemplify how easy it is to use s in an object oriented way with Pandas.

## Final thoughts

I have found that programmer either hate or love Matplotlib. I do not always know why, but I have discovered that this non-object oriented way of using Matplotlib is annoying some programmers.

This is a good reason to hate it, but I would say that there are no good alternative to Matplotlib – or at least, they are build upon Matplotlib.

I like the power and ease using Matplotlib. I do like that the option of using it object oriented, which makes the code more intuitive and easier to understand for other programmers.

## What will we cover in this tutorial?

How to create excel report automation with Python. This will be done on the data similar to this.

And we will create Excel reports from Python like these ones.

## Step 1: Load the data in Pandas DataFrame

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas has great integration with Excel – both reading and writing Excel files. We will use that in this tutorial.

```import pandas as pd

```

Which will result in output similar to this.

```	Sales rep	Item	Price	Quantity	Sale
Date
2020-05-31	Mia	Markers	4	1	4
2020-02-01	Mia	Desk chair	199	2	398
2020-09-21	Oliver	Table	1099	2	2198
2020-07-15	Charlotte	Desk pad	9	2	18
2020-05-27	Emma	Book	12	1	12
```

## Step 2: Group the data by Sales rep

To identify how much each Sales rep has sold, we will group it and calculate the sum.

```repr = data.groupby("Sales rep")
repr_sales = repr['Sale'].sum()
```

Now that is too simple, right?

The data in rep_sales looks like this.

```Sales rep
Charlotte     74599
Emma          65867
Ethan         40970
Liam          66989
Mia           88199
Noah          78575
Oliver        89355
Sophia       103480
William       80400
Name: Sale, dtype: int64
```

## Step 3: Group the data by month, change to month names and sum it

This is a bit more challenging, as we need to group the dates in months and rename the index to month names.

Except if you know how to do it, as you will in a moment.

```months = data.groupby(pd.Grouper(freq="M"))
months_sales = months['Sale'].sum()
months_sales.index = months_sales.index.month_name()
```

You are awesome. See, you use groupby on a Grouper with frequency month (“M”). To use the month name on index and not the last day of month, you reassign the index to use index.month_name().

This creates the following output where you can see the month names are used.

```Date
January      69990
February     51847
March        67500
April        58401
May          40319
June         59397
July         64251
August       51571
September    55666
October      50093
November     57458
December     61941
Name: Sale, dtype: int64
```

## Step 4: Create Excel sheet with the data from Step 2 and 3

This is done by creating an Excel writer.

```writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")
writer.close()
```

This will create an Excel sheet of name SalesReport.xlsx, with two sheets Sale per rep and Sale per month, respectively.

But let’s try to add the two charts.

## Step 5: Adding the charts

This is a bit more involved. As you see, the above steps have been quite straight forward and not required any high level Python programming skills (or Excel, except you need to know how to open an Excel sheet).

```writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")

'values': '=\'Sale per rep\'!\$B\$2:\$B\$10',
'categories': '=\'Sale per rep\'!\$A\$2:\$A\$10',
'name': "Sale"
})
writer.sheets['Sale per rep'].insert_chart("D2", chart)

'values': '=\'Sale per month\'!\$B\$2:\$B\$13',
'categories': '=\'Sale per month\'!\$A\$2:\$A\$13',
'name': "Sale"
})
writer.sheets['Sale per month'].insert_chart("D2", chart)

writer.close()
```

This will create the sheets we have been waiting for. The charts are dynamic, meaning, that if you change the values the charts will change accordingly.

## Final remarks

Remember to see the YouTube video, which explains the material in more depth.

If you like the above content, please share it on a social media or point a friend in this direction.