Excel Automation with Simple Moving Average from Python

What will we cover in this tutorial?

We will retrieve the historic stock prices and calculate the moving average. Then we will export the data to Excel and insert a chart, but all done from Python.

See the in depth explanation in the YouTube video. It also gives advice on how to interpret the Simple Moving Averages (SMA).

Step 1: Read historic stock prices

We will use the Pandas-datarader to get the historic prices of NFLX (the ticker for Netflix).

import pandas_datareader as pdr
import datetime as dt

ticker = "NFLX"
start = dt.datetime(2019, 1, 1)

data = pdr.get_data_yahoo(ticker, start)
print(data.head())

And you will get the historic data for Netflix from January 1st, 2019.

	High	Low	Open	Close	Volume	Adj Close
Date						
2019-01-02	269.750000	256.579987	259.279999	267.660004	11679500	267.660004
2019-01-03	275.790009	264.429993	270.200012	271.200012	14969600	271.200012
2019-01-04	297.799988	278.540009	281.880005	297.570007	19330100	297.570007
2019-01-07	316.799988	301.649994	302.100006	315.339996	18620100	315.339996
2019-01-08	320.589996	308.010010	319.980011	320.269989	15359200	320.269989

Step 2: Understand Moving Average

We will calculate the Simple Moving Average as defined on Investopedia.

Simple Moving Average

The Simple Moving Average (Now just referred to as Moving Average or MA) is defined by a period of days.

That is, the MA of a period of 10 (MA10) will take the average value of the last 10 close prices. This is done in a rolling way, hence, we will get a MA10 for every trading day in our historic data, except the first 9 days in our dataset.

We can similarly calculate a MA50 and MA200, which is a Moving Average of the last 50 and 200 days, respectively.

Step 3: Calculating the Moving Averages

We can do that by using rolling and mean.

And it is magic.

data['MA10'] = data['Close'].rolling(10).mean()
data['MA50'] = data['Close'].rolling(50).mean()
data['MA200'] = data['Close'].rolling(200).mean()

print(data.tail())

That was easy, right?

	High	Low	Open	Close	Volume	Adj Close	MA10	MA50	MA200
Date									
2021-01-12	501.089996	485.670013	500.000000	494.250000	5990400	494.250000	515.297998	502.918599	477.08175
2021-01-13	512.349976	493.010010	495.500000	507.790009	5032100	507.790009	512.989999	503.559600	477.76590
2021-01-14	514.500000	499.579987	507.350006	500.859985	4177400	500.859985	510.616995	503.894399	478.39270
2021-01-15	506.320007	495.100006	500.000000	497.980011	5890200	497.980011	506.341998	504.109600	479.06220
2021-01-19	509.250000	493.540009	501.000000	501.769989	11996900	501.769989	504.232999	504.205999	479.72065

Step 4: Visualize it with Matplotlib

We can see the data with Matplotlib.

import matplotlib.pyplot as plt

data[['Close', 'MA10', 'MA50']].loc['2020-01-01':].plot()
plt.show()

Resulting in the following plot.

The output

Where you can see how the MA10 and MA50 move according to the price.

Step 5: Export to Excel

Now we will export the data to Excel.

For this we need to import Pandas and use the XlsxWriter engine, where you can find the details of the code.

The code can be found here.

import pandas as pd

data = data.loc['2020-01-01':]
data = data.iloc[::-1]
writer = pd.ExcelWriter("technical.xlsx", 
                        engine='xlsxwriter', 
                        date_format = 'yyyy-mm-dd', 
                        datetime_format='yyyy-mm-dd')

sheet_name = 'Moving Average'
data[['Close', 'MA10', 'MA50']].to_excel(writer, sheet_name=sheet_name)


worksheet = writer.sheets[sheet_name]
workbook = writer.book

# Create a format for a green cell
green_cell = workbook.add_format({
    'bg_color': '#C6EFCE',
    'font_color': '#006100'
})

# Create a format for a red cell
red_cell = workbook.add_format({
    'bg_color': '#FFC7CE',                            
    'font_color': '#9C0006'
})


# Set column width of Date
worksheet.set_column(0, 0, 15)


for col in range(1, 4):
    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2>=D2',
        'format': green_cell
    })

    # Create a conditional formatted of type formula
    worksheet.conditional_format(1, col, len(data), col, {
        'type': 'formula',                                    
        'criteria': '=C2<D2',
        'format': red_cell
    })

# Create a new chart object.
chart1 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart1.add_series({
        'name': "MA10",
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 2, len(data), 2],
})

# Create a new chart object.
chart2 = workbook.add_chart({'type': 'line'})

# Add a series to the chart.
chart2.add_series({
        'name': 'MA50',
        'categories': [sheet_name, 1, 0, len(data), 0],
        'values': [sheet_name, 1, 3, len(data), 3],
})

# Combine and insert title, axis names
chart1.combine(chart2)
chart1.set_title({'name': sheet_name + " " + ticker})
chart1.set_x_axis({'name': 'Date'})
chart1.set_y_axis({'name': 'Price'})

# Insert the chart into the worksheet.
worksheet.insert_chart('F2', chart1)

writer.close()

Where the output will be something similar to this.

Generated Excel sheet

Create Financial Analysis and With Multiple Sheets and Charts in Excel from Python

What will we cover in this tutorial?

We will get financial data from FPM cloud (Free stock API) for the last years and generate a 3 sheet Excel sheet with charts similar to this. All done from Python.

Step 1: Collect the data

In this tutorial we are only going to use the example data on Apple, that is free available without registering on FMP Cloud. If you want to do it on another stock yo will need to register on their site.

What we need is the income statement and cash flow statement. They are available as JSON on their page (income statement and cash flow statement).

As our main library we will use Pandas.

The data can be collected with the following code (NOTICE: The code syntax highlighter changes & to &amp; in the URLs => remove the amp;).

import pandas as pd

income_statement = pd.read_json("https://fmpcloud.io/api/v3/income-statement/AAPL?limit=120&apikey=demo")
income_statement.set_index('date', inplace=True)
cash_flow = pd.read_json("https://fmpcloud.io/api/v3/cash-flow-statement/AAPL?limit=120&apikey=demo")
cash_flow.set_index('date', inplace=True)

Notice that we set the index to be the date column. This makes the further work easier.

Step 2: Prepare the data

The next step we need to do is to make the necessary calculations and prepare the data.

We are only interested Revenue, Earnings Per Share (EPS) and Free Cash Flow (FCF). So let’s take that data and keep it in a DataFrame (The Pandas main data structure).

data = income_statement[['revenue', 'eps']].copy()
data['fcf'] = cash_flow['freeCashFlow']

Now the data should look something like this.

	        revenue	        eps	fcf
date			
2020-09-26	274515000000	3.3600	7.336500e+10
2019-09-28	260174000000	2.9925	5.889600e+10
2018-09-29	265595000000	3.0025	6.412100e+10
2017-09-30	229234000000	2.3175	5.080300e+10
2016-09-24	215639000000	2.0875	5.349700e+10

Step 3: Calculate Percentage change

We are actually not interested in the actual values. The only thing we care about is the percentage growth from year to year.

As the data is in reverse order, and we need to calculate from the beginning to the end. We first need to order the dates from old to newest.

Then it is straight forward to calculate using pct_change() method.

data = data.sort_index()

data['rev %-chg'] = data['revenue'].pct_change()
data['eps %-chg'] = data['eps'].pct_change()
data['fcf %-chg'] = data['fcf'].pct_change()

Now the data should look similar to this (or the tail of it).

	        revenue	        eps	fcf	        rev %-chg	      eps %-chg	fcf %-chg
date						
2016-09-24	215639000000	2.0875	5.349700e+10	-0.077342	-0.100216	-0.233326
2017-09-30	229234000000	2.3175	5.080300e+10	0.063045	0.110180	-0.050358
2018-09-29	265595000000	3.0025	6.412100e+10	0.158620	0.295577	0.262150
2019-09-28	260174000000	2.9925	5.889600e+10	-0.020411	-0.003331	-0.081487
2020-09-26	274515000000	3.3600	7.336500e+10	0.055121	0.122807	0.245670

Step 5: Preparing data used in each Excel sheet

The next step is to prepare the data for each sheet in the Excel document.

We only need the last 10 years of data and will use loc to get that. Then we create a DataFrame for each sheet.

data_set = data.loc['2010-01-01':]
data_set.index = data_set.index.year

rev = data_set[['revenue', 'rev %-chg']].copy()
eps = data_set[['eps', 'eps %-chg']].copy()
fcf = data_set[['fcf', 'fcf %-chg']].copy()

Notice that we set the index only to be the year.

Step 6: Creating the Excel sheet

To create the Excel sheet we will use Pandas with the XlsxWriter.

This can be done as follows.

def create_sheet(writer, df, sheet_name):
    df.to_excel(writer, sheet_name=sheet_name)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()

This will create a 3-sheet Excel document. One sheet for each DataFrame (rev, eps, fcf).

The function create_sheet(…) is practical, as each DataFrame is similar and we need to create the same sheet for each DataFrame.

The result will be in the Excel document financial.xlsx.

Step 7: Creating the charts

Now it pays to have the function create_sheet(…).

The full Excel document can be created by the following code.

def create_sheet(writer, df, sheet_name):
    df.to_excel(writer, sheet_name=sheet_name)
    worksheet = writer.sheets[sheet_name]
    workbook = writer.book
      
    chart = workbook.add_chart({'type': 'line'})
    
    chart.add_series({
        'name': sheet_name,
        'categories': [sheet_name, 1, 0, 11, 0],
        'values': [sheet_name, 1, 1, 11, 1],
    })
    
    column_chart = workbook.add_chart({'type': 'column'})
    
    column_chart.add_series({
        'name': sheet_name,
        'categories': [sheet_name, 1, 0, 11, 0],
        'values': [sheet_name, 1, 2, 11, 2],
        'y2_axis': True,
    })
    
    chart.combine(column_chart)
    chart.set_title({'name': sheet_name})
    chart.set_x_axis({'name': 'Date'})
    chart.set_y_axis({'name': 'Value'})
    column_chart.set_y2_axis({'name': '%-chg'})
    
    worksheet.insert_chart("E1", chart)

writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()writer = pd.ExcelWriter("financials.xlsx", engine='xlsxwriter')
workbook = writer.book

create_sheet(writer, rev, 'Revenue')
create_sheet(writer, eps, 'EPS')
create_sheet(writer, fcf, 'FCF')

workbook.close()

The details of creating the charts in Excel can be found on XlsxWriter. Basically, it is sending informations to the XlsxWriter engine through dictionaries. You need to send the values that you would set in Excel, if you were working inside there.

Again, the result will be in the Excel document financial.xlsx.

How to Create Excel Report Automation with Python

What will we cover in this tutorial?

How to create excel report automation with Python. This will be done on the data similar to this.

The data we work with – 1000 lines like the above – get Excel sheet form link below.

And we will create Excel reports from Python like these ones.

The first sheet we will create in Excel from Python.
The second Excel sheet we create from Python.

Get direct Download link to JuPyter Notebook and Excel sheet used in YouTube video and follow along.

Step 1: Load the data in Pandas DataFrame

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas has great integration with Excel – both reading and writing Excel files. We will use that in this tutorial.

To read the data simply use the read_excel(…).

import pandas as pd

data = pd.read_excel("SalesData.xlsx", index_col="Date")
print(data.head())

Which will result in output similar to this.

	Sales rep	Item	Price	Quantity	Sale
Date					
2020-05-31	Mia	Markers	4	1	4
2020-02-01	Mia	Desk chair	199	2	398
2020-09-21	Oliver	Table	1099	2	2198
2020-07-15	Charlotte	Desk pad	9	2	18
2020-05-27	Emma	Book	12	1	12

Step 2: Group the data by Sales rep

To identify how much each Sales rep has sold, we will group it and calculate the sum.

repr = data.groupby("Sales rep")
repr_sales = repr['Sale'].sum()

Now that is too simple, right?

The data in rep_sales looks like this.

Sales rep
Charlotte     74599
Emma          65867
Ethan         40970
Liam          66989
Mia           88199
Noah          78575
Oliver        89355
Sophia       103480
William       80400
Name: Sale, dtype: int64

Step 3: Group the data by month, change to month names and sum it

This is a bit more challenging, as we need to group the dates in months and rename the index to month names.

Except if you know how to do it, as you will in a moment.

months = data.groupby(pd.Grouper(freq="M"))
months_sales = months['Sale'].sum()
months_sales.index = months_sales.index.month_name()

You are awesome. See, you use groupby on a Grouper with frequency month (“M”). To use the month name on index and not the last day of month, you reassign the index to use index.month_name().

This creates the following output where you can see the month names are used.

Date
January      69990
February     51847
March        67500
April        58401
May          40319
June         59397
July         64251
August       51571
September    55666
October      50093
November     57458
December     61941
Name: Sale, dtype: int64

Step 4: Create Excel sheet with the data from Step 2 and 3

This is done by creating an Excel writer.

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")
writer.close()

This will create an Excel sheet of name SalesReport.xlsx, with two sheets Sale per rep and Sale per month, respectively.

But let’s try to add the two charts.

Step 5: Adding the charts

This is a bit more involved. As you see, the above steps have been quite straight forward and not required any high level Python programming skills (or Excel, except you need to know how to open an Excel sheet).

writer = pd.ExcelWriter("SalesReport.xlsx")
repr_sales.to_excel(writer, sheet_name="Sale per rep")
months_sales.to_excel(writer, sheet_name="Sale per month")

chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per rep\'!$B$2:$B$10',
    'categories': '=\'Sale per rep\'!$A$2:$A$10',
    'name': "Sale"
})
writer.sheets['Sale per rep'].insert_chart("D2", chart)

chart = writer.book.add_chart({'type': 'column'})
chart.add_series({
    'values': '=\'Sale per month\'!$B$2:$B$13',
    'categories': '=\'Sale per month\'!$A$2:$A$13',
    'name': "Sale"
})
writer.sheets['Sale per month'].insert_chart("D2", chart)

writer.close()

This will create the sheets we have been waiting for. The charts are dynamic, meaning, that if you change the values the charts will change accordingly.

First sheet from our results
Second sheet from out results

Final remarks

Remember to see the YouTube video, which explains the material in more depth.

If you like the above content, please share it on a social media or point a friend in this direction.

How To Color Cells Dynamically in Excel Based of Formula done from Python

What will we cover in this tutorial?

We will learn how you can dynamically format the cells in Excel to have a specific color based on the value of a function – all done from Python.

As an example we will color the cells of two columns depending on which is larger.

If the value in MA50 is larger than value in MA200 it will color green and red otherwise.

In this concrete example we will collect historical stock prices and calculate the Moving Averages of 50 and 200 days on the closing price. In a crossover way to utilize moving averages, you would send a buy-signal when the 50-day moving average crosses above the 200-day moving average and a sell-signal if the other way around.

Step 1: Collect the data from Yahoo! Finance using Pandas-datareader

A great library to use to collect financial data is the Pandas-datareader.

We will collect data from Yahoo! Finance. As an example we will use General Motors historical stock data.

This can be done as follows.

import pandas_datareader as pdr
import datetime as dt


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

This will result in a Pandas DataFrame with data similar to this.

                 High        Low       Open      Close      Volume  Adj Close
Date                                                                         
2019-01-02  33.810001  32.410000  32.849998  33.639999   7662300.0  31.893360
2019-01-03  33.680000  32.200001  33.340000  32.250000  11618300.0  30.575533
2019-01-04  33.570000  32.430000  32.799999  33.330002  10784000.0  31.599455
2019-01-07  34.970001  33.480000  33.709999  34.360001  10722300.0  32.575977
2019-01-08  35.680000  34.490002  35.000000  34.810001  10532900.0  33.002617

Step 2: Calculate the Moving Averages

The moving average can be calculated by applying the rolling() and mean() methods on the DataFrame.

The rolling() creates a window size which is the number of observations used for calculating the statistic.

To calculate the Moving Average of window size 50 will use the last 50 data points to calculate the mean. To be consistent with Yahoo! Finance way of calculate the value, they use the Close price.

import pandas_datareader as pdr
import datetime as dt
import xlsxwriter
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

Which will give a tail similar to this output.

                 High        Low       Open  ...  Adj Close     MA50     MA200
Date                                         ...                              
2020-12-14  42.540001  41.419998  42.490002  ...  41.619999  38.5632  29.03155
2020-12-15  42.160000  41.310001  41.990002  ...  41.660000  38.7772  29.08725
2020-12-16  41.869999  40.810001  41.790001  ...  41.419998  38.9976  29.13670
2020-12-17  42.029999  41.430000  41.709999  ...  42.029999  39.2058  29.19635
2020-12-18  42.042801  41.139999  42.020000  ...  41.389999  39.3894  29.25985

Where the Moving Averages are added to the DataFrame.

Step 3: Exporting to Excel using XlsxWriter

We will use the powerful XlsxWriter to create our Excel sheet.

You can write the DataFrame directly by using to_excel(…), but we want to do more than that. Hence, we use the XlsxWriter directly.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

# We only want the index as a date and not datetime object
ticker.index = ticker.index.date

# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')

# Remember to close the writer to write the Excel sheet
writer.close()

This will create an Excel sheet similar to this one.

Step 4: Adding formatting to the cells

This is a bit more complex as we need to do the magic here.

We first need to create a reference to the sheet (worksheet) we work on to access the functionality.

The first thing we do is to resize column A, such that it opens in the correct size to see the date. This is done by using set_column(…).

Then we create a format, which can be applied on Excel cells. This is how we color them. Hence, we create one for green cells and later one for red cells.

We insert them by using conditional_format(…), which a range of cells. This can be applied in different ways, we use the A1 notation.

The options parameter in conditional_format(…) must be a dictionary containing the parameters that describe the type and style of the conditional format. To see details of the format please refer to the manual.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


start = dt.datetime(2019, 1, 1)
ticker = pdr.get_data_yahoo("GM", start)

ticker['MA50'] = ticker['Close'].rolling(50).mean()
ticker['MA200'] = ticker['Close'].rolling(200).mean()

# We only want the index as a date and not datetime object
ticker.index = ticker.index.date

# We skip the first 200 entries, as it they do not have the MA200
ticker = ticker.iloc[200:]
# We reverse the DataFrame to have newest first
ticker = ticker.iloc[::-1]

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('moving_average.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name='Moving Averages')

# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Moving Averages']
# Resize the column A
worksheet.set_column("A:A", 12)

# Calculate the last row number (we insert first DataFrame row in row 2)
last_row = len(ticker) + 1

# Create a format for a green cell
cell_format_green = workbook.add_format()
cell_format_green.set_bg_color('green')

# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2>=$I2',
                                                         'format':   cell_format_green})

# Create a format for a red cell
cell_format_red = workbook.add_format()
cell_format_red.set_bg_color('red')

# Create a conditional formatted of type formula
worksheet.conditional_format('H2:I{}'.format(last_row), {'type':     'formula',
                                                         'criteria': '=$H2<$I2',
                                                         'format':   cell_format_red})

# Remember to close the writer
writer.close()

This will result in the expected Excel sheet.

Step 5: Validating it works as expected

Now for the final test.

The colors should be updated dynamically. Let’s try to change the value and see what happens.

It does. If we change the value of H2 to 10, it turns the cells to red.

How To Extract Numbers From Strings in HTML Table and Export to Excel from Python

What will we cover in this tutorial?

How to import a HTML table to Excel.

But that is easy? You can do that directly from Excel.

Yes, but what if entries contains numbers and string together, then the import will convert it to a string and makes it difficult to get the number extracted from the string.

Luckily, we will cover how to do that easy with Python.

Step 1: Get the dataset

Find your favorite HTML table online. For the purpose of this tutorial I will use this one from Wikipedia with List of Metro Systems.

View of HTML table of interest

Say, what if we wanted to sum how many stations are in this table (please notice that the table contains more rows than shown in the above picture).

If you import that directly into Excel, with the import functionality you will realize that the column of stations will be interpreted as strings. The problem is, that it will look like 19[13], while we are only interested in the number 19.

There is no build in functionality to do that directly in Excel.

But let’s try to import this into Python. We will use Pandas to do that. If you are new to Pandas, please see this tutorial.

import pandas as pd


url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)

print(tables[0].head())

Which will result in the following output.

/Users/admin/PycharmProjects/LearningSpace/venv/bin/python /Users/admin/PycharmProjects/LearningSpace/test.py
           City    Country  ...          System length Annual ridership(millions)
0       Algiers    Algeria  ...  18.5 km (11.5 mi)[14]           45.3 (2019)[R 1]
1  Buenos Aires  Argentina  ...  56.7 km (35.2 mi)[16]          337.7 (2018)[R 2]
2       Yerevan    Armenia  ...   13.4 km (8.3 mi)[17]           20.2 (2019)[R 3]
3        Sydney  Australia  ...  36 km (22 mi)[19][20]  14.2 (2019) [R 4][R Nb 1]
4        Vienna    Austria  ...  83.3 km (51.8 mi)[21]          459.8 (2019)[R 6]

Where we have the same problem. If we inspect the type of the columns we get the following.

City                          object
Country                       object
Name                          object
Yearopened                    object
Year of lastexpansion         object
Stations                      object
System length                 object
Annual ridership(millions)    object
dtype: object

Where actually all columns are of type object, which here is equivalent to a string.

Step 2: Extract the numbers from Stations and System length column

The DataStructure of the tables in tables is a DataFrame, which is Pandas main data structure.

As the strings we want to convert from string to integers are containing more information than just the numbers, we cannot use the DataFrame method to_numeric().

We want to convert something of the form 19[13] to 19.

To do that easily, we will use the apply(…) method on the DataFrame.

The apply-method takes a function as argument and applies it on each row.

We will use a lambda function as argument. If you are not familiar with lambda functions, please read this tutorial.

import pandas as pd


url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)
table = tables[0]

table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)

print(table[['Stations', 'System length']].head())

Which will result in the following output.

   Stations  System length
0        19           18.5
1        90           56.7
2        10           13.4
3        13           36.0
4        98           83.3

This is what we want.

Step 3: Export to Excel

Wow. This needs an entire step?

Well, of course it does.

Here we need to unleash the power of Pandas and use the to_excel(…) method.

import pandas as pd


url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)
table = tables[0]

table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)

table.to_excel('output.xlsx')

This will result in an Excel file looking similar to this, where the Stations and System length columns are numeric and not string.

Excel file now with Stations and System length as numbers and not strings

What’s next?

Want to learn more about Python and Excel?

Check out my online guide.

Create Excel Sheet with Stock Prices and Moving Average with Chart all from Python

What will we cover in this tutorial?

In this tutorial we will read a historic stock prices, calculate the moving average and export that to an Excel sheet and insert a chart with prices and moving average. And all will be done from Python using Pandas and Pandas-datareader.

Step 1: Get the historic stock prices

A great place to get historic stock prices is using Pandas-datareader. It has a great interface to various places to get the data. In this tutorial we will use Yahoo API through the Pandas-datareader.

It doesn’t require any registration to use the API. It works straight out of the box.

To get stock prices in time series you need to find the ticker of your favorite stock. In this tutorial we will use Apple, which has ticker AAPL.

import pandas_datareader as pdr
import datetime as dt

start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)

print(ticker.head())

You also need to set the date to set how far back you want historic stock prices. By default, you will get up to the most current date. Also, you will data for each day open. You can use more arguments to configure what if you want weekly or monthly prices. Also, you can set a end-date, if you like.

The above code should give output similar to the following.

                 High        Low       Open      Close       Volume  Adj Close
Date                                                                          
2020-01-02  75.150002  73.797501  74.059998  75.087502  135480400.0  73.840042
2020-01-03  75.144997  74.125000  74.287498  74.357498  146322800.0  73.122154
2020-01-06  74.989998  73.187500  73.447502  74.949997  118387200.0  73.704819
2020-01-07  75.224998  74.370003  74.959999  74.597504  108872000.0  73.358185
2020-01-08  76.110001  74.290001  74.290001  75.797501  132079200.0  74.538239

Step 2: Calculate the Moving Average

To calculate the moving average (also called the simple moving average), we can use the rolling method on a DataFrame.

The rolling method takes one argument, which is the window size. This indicates how big a window we want to apply a function on. In this case we want to apply the mean function on a window of size 50.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt

start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)

ticker['MA'] = ticker['Close'].rolling(50).mean()

print(ticker[50:].head())

This calculates the simple moving average of window size 50. This will give the same result as the moving average with the default window of 50 will give in Yahoo! Finance.

The output will be as follows.

                 High        Low       Open  ...       Volume  Adj Close        MA
Date                                         ...                                  
2020-03-16  64.769997  60.000000  60.487499  ...  322423600.0  59.687832  76.16100
2020-03-17  64.402496  59.599998  61.877499  ...  324056000.0  62.312309  75.93815
2020-03-18  62.500000  59.279999  59.942501  ...  300233600.0  60.786911  75.67250
2020-03-19  63.209999  60.652500  61.847500  ...  271857200.0  60.321156  75.40445
2020-03-20  62.957500  57.000000  61.794998  ...  401693200.0  56.491634  75.03470

Notice we removed the first 50 rows (actually 51, as we index from 0). This is because the MA (moving average) column will not have numbers before we reach here.

Step 3: Export data to Excel and create a chart with close prices and moving average

Now this is where it all get’s a bit more complicated. It takes some reading in the manual to figure all this out.

The code is commented to explain what happens.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt

# Read the stock prices from Yahoo! Finance
start = dt.datetime(2020, 1, 1)
ticker = pdr.get_data_yahoo("AAPL", start)

# Calculate the moving average with window size 50
ticker['MA'] = ticker['Close'].rolling(50).mean()

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('apple.xlsx', engine='xlsxwriter')

# Name the sheet
sheet_name = "Apple"

# We convert the index from datetime to date
# This makes the data in Excel only have the date and
# not the date with time: 00:00:00:0000
ticker.index = ticker.index.date

# Skip the first 51 rows (to be pedantic, 49 is the correct number)
ticker = ticker[50:]

# Convert the dataframe to an XlsxWriter Excel object.
ticker.to_excel(writer, sheet_name=sheet_name)

# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets[sheet_name]

# Widen the first column to display the dates.
worksheet.set_column('A:A', 12)

# Get the number of rows and column index
max_row = len(ticker)
col_ma = ticker.columns.get_loc('MA') + 1
col_close = ticker.columns.get_loc('Close') + 1

# Create a chart object of type line
chart = workbook.add_chart({'type': 'line'})

# Insert the first dataset into chart
chart.add_series({
    'name': "MA",
    'categories': [sheet_name, 1, 0, max_row, 0],
    'values': [sheet_name, 1, col_ma, max_row, col_ma],
})
# Insert the second dataset in the same chart
chart.add_series({
    'name': "Close",
    'values': [sheet_name, 1, col_close, max_row, col_close],
})

# Configure axis
chart.set_x_axis({
    'name': 'Date',
    'date_axis': True,
})
chart.set_y_axis({
    'name': '$',
    'major_gridlines': {'visible': False}
})

# Insert the chart into the worksheet.
worksheet.insert_chart('I2', chart)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

The above code will create an Excel sheet looking like this.

Result

How to Concatenate Multiple CSV Files and Export to Excel from Python

What will we cover in this tutorial?

In this tutorial we will show how you can concatenate CSV files of the same format and export it to an Excel sheet. All done in Python by using Pandas.

This is something you want in your Python toolbox as it will make it easy for you to get quick wins helping people with manual work.

Step 0: Preparation

We need multiple CSV files in this tutorial. If you have CSV files with the same format, then you can skip to the next step.

First, let’s create some random data by using the power of NumPy. This can be done by using the uniform method in the NumPy random package.

import numpy as np
import pandas as pd


for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.DataFrame(np.random.uniform(0, 100, size=(100, 4)), columns=list('ABCD'))
    df.to_csv(file_name)

As you see, we use the nice way to generate the random data by calling the random.uniform, which takes the lower bound and upper bound of the uniform interval to get numbers from, and the size parameter is the dimension of the NumPy array. Here it generates 100 rows in 4 columns.

The DataFrame constructor can convert the NumPy array to the DataFrame and setting the column names to A, B, C, and D, respectively. This is done by converting the the string ‘ABCD‘ to a list, resulting in [‘A’, ‘B’, ‘C’, ‘D’].

The content is stored to a csv file by simply calling to_csv on the DataFrame object.

The files will look similar to this (only the beginning of one here).

,A,B,C,D
0,59.74500080886668,59.63123820144265,47.53898343676622,50.31013442167491
1,31.00881083223196,30.91970178258706,93.2538048567809,41.95421609366061
2,32.03557607643186,96.34180274801857,70.8926221704894,90.79417905937711
3,22.44980451436659,61.25077305144971,24.013823071196082,31.401361361506062
4,71.03827853534764,20.527320250264925,29.379395377514662,94.89076442880472
5,76.1462091142596,20.509207361552917,66.57572207423806,69.38906836001392

Step 1: Read a CSV file with Pandas

Now it is time for the fun stuff. Read one csv file into a Pandas DataFrame.

This can be done by calling the read_csv method in the Pandas library.

import pandas as pd


df = pd.read_csv("data_0.csv")
print(df)

This gives the following output.

    Unnamed: 0          A          B          C          D
0            0  59.745001  59.631238  47.538983  50.310134
1            1  31.008811  30.919702  93.253805  41.954216
2            2  32.035576  96.341803  70.892622  90.794179
3            3  22.449805  61.250773  24.013823  31.401361
4            4  71.038279  20.527320  29.379395  94.890764

Or the beginning of it.

Now we are ready to read all the content files and concatenate them.

Step 2: Concatenate Multiple DataFrames into One DataFrame

First we need to read all the csv files. This can be done in many ways. Here we will iterate over them and append them to a list.

After that we will concatenate them all.

This can be done by using the concat method in the Pandas library. The concat function takes an iterable as the the first argument. That is great, because a list is an iterable.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list)
print(df_concat)

Which results in a single DataFrame with 300 rows.

    Unnamed: 0          A          B          C          D
0            0  59.745001  59.631238  47.538983  50.310134
1            1  31.008811  30.919702  93.253805  41.954216
..         ...        ...        ...        ...        ...
98          98  65.831197  47.070415  65.205620  64.701177
99          99  27.259875  97.514085   8.444346  65.991040

[300 rows x 5 columns]

Notice that the index is reset for each concatenation. We can handle that by adding a ignore_index=True in the concat call.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list, ignore_index=True)
print(df_concat)

Which results in the following result.

     Unnamed: 0          A          B          C          D
0             0  59.745001  59.631238  47.538983  50.310134
1             1  31.008811  30.919702  93.253805  41.954216
..          ...        ...        ...        ...        ...
298          98  65.831197  47.070415  65.205620  64.701177
299          99  27.259875  97.514085   8.444346  65.991040

[300 rows x 5 columns]

Now ending at 299 rows.

Step 3: Write the output to an Excel file

This is simply done by using the to_excel method on the DataFrame object. It takes the file name as argument, and it should be of the format xlsx.

Notice, some installations of Pandas do not include the XlsxWriter module. Then you need to install that. You will know when you run the code below, it will say it is missing. Then you can install it by using the pip.

pip install xlsxwriter

Or if you are in PyCharm, you can simply write import xlsxwriter on a line and hover over the red line of xlsxwriter and select install. Then it should install it for you.

Now to the full code.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list, ignore_index=True)
df_concat.to_excel("output.xlsx")

This results in an Excel file similar to this.

Resulting output.

Quick Tutorial on Pandas to Excel in 3 Steps – Master the Basics

What will we cover in this tutorial?

We will cover the basic process of saving data from a Pandas DataFrame to an Excel sheet. This includes

  • The simple export of a DataFrame to Excel from Python.
  • How to set the Sheet name in Excel from Python.
  • To set which columns to export from the DataFrame to Excel.

Step 1: The simple export of a full DataFrame

Often you just need to figure out how to save the full DataFrame to an Excel sheet. This is often needed, as you have collected the data in Python, but want to explore it or work more on it in Excel.

If you are new to Pandas, we can recommend to read this tutorial.

Pandas is a great library to work with data similar to an Excel sheet. The advantage of Pandas is working with the data in Python and automate many things. That could be collecting data from API’s on the internet or simply automate the stuff you do manually in Excel.

Another beauty of Pandas is that it has great built in methods that does all the work for you.

Here we will demonstrate how a call to to_excel(…) can be done on a DataFrame.

import pandas as pd
import numpy as np


# Generate some random increasing data
df = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)


df.to_excel("output.xlsx")

The code above creates a random DataFrame using the random function from NumPy.

Then it continues to call to_excel on the data frame with the output Excel file name. This will create an Excel sheet, which could look similar to this.

Step 2: How to set the sheet name in Excel from Python

Sometimes you want your data more organized. Here we are going to show how to set the sheet name where you export your data.

The default sheet name is Sheet1, hence in the previous step you would notice that the created sheet has that name.

Sometimes you want a different name. Luckily, this is simple.

import pandas as pd
import numpy as np


# Generate some random increasing data
df = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)


df.to_excel("output.xlsx", sheet_name="My Sheet")

Notice that we only set the sheet_name parameter in the call. That is all it takes.

How did I know? I read the documentation.

This creates an Excel file with the desired sheet name.

Step 3: Only export the specific columns

Often it happens that you are only interested in exporting some of the columns in your DataFrame. Luckily, this is also simple to do from the standard export function on the DataFrame.

import pandas as pd
import numpy as np


# Generate some random increasing data
df = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'C': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'D': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'E': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'F': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)


df.to_excel("output.xlsx", sheet_name="My Sheet", columns=['B', 'E', 'F'])

In the above example we create a DataFrame with a few columns. We only export columns B, E, and F.

This is set directly in the to_excel method, which can set the parameter columns. The parameter can accept a list of the column names to export to the Excel sheet.

Next step

This is only to get you started and see how simple it is to work with data from Python using Pandas. This enables you to make great things with only a few lines of code.