From HTML Tables to Excel with Pandas: Free Cash Flow and Revenue of Microsoft

What will we cover in this tutorial?

Yes, you can do it manually. Copy from an HTML table and paste into an Excel spread sheet. Or you can dive into how to pull data directly from the internet into Excel. Sometimes it is not convenient, as some data needs to be transformed and you need to do it often.

In this tutorial we will show how this can be easily automated with Python using Pandas.

That is we go from data that needs to be transformed, like, $102,000 into 102000. Also, how to join (or merge) different datasources before we create a Excel spread sheet.

Step 1: The first data source: Revenue of Microsoft

There are many sources where you can get this data, but Macrotrends has it nicely in a table and for more than 10 years old data.

First thing first, let’s try to take a look at the data. You can use Pandas read_html to get the data from the tables given a URL.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
print(revenue)

Where we know it is in the first table on the page. A first few lines of the output is given here.

    Microsoft Annual Revenue(Millions of US $) Microsoft Annual Revenue(Millions of US $).1
0                                         2020                                     $143,015
1                                         2019                                     $125,843
2                                         2018                                     $110,360
3                                         2017                                      $96,571
4                                         2016                                      $91,154

First thing to manage are the column names and setting the year to the index.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
print(revenue)

A first few lines.

      Revenue
Year          
2020  $143,015
2019  $125,843
2018  $110,360
2017   $96,571
2016   $91,154

That helped. But then we need to convert the Revenue column to integers. This is a bit tricky and can be done in various ways. We first need to remove the $-sign, then the comma-sign, before we convert it.

revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

And that covers it.

Step 2: Getting another data source: Free Cash Flow for Microsoft

We want to combine this data with the Free Cash Flow (FCF) of Microsoft.

The data can be gathered the same way and column and index can be set similar.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')
print(fcf)

The first few lines are.

     FCF
Year
2020 45234.0
2019 38260.0
2018 32252.0
2017 31378.0
2016 24982.0

All ready to be joined with the other data.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

# print(revenue)

url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')

data = revenue.join(fcf)

# Let's reorder it
data = data.iloc[::-1].copy()

Where we also reorder it, to have it from the early ears in the top. Notice the copy(), which is not strictly necessary, but makes a hard-copy of the data and not just a view.

      Revenue      FCF
Year                  
2005    39788  15793.0
2006    44282  12826.0
2007    51122  15532.0
2008    60420  18430.0
2009    58437  15918.0

Wow. Ready to export.

Step 3: Exporting it to Excel

This is too easy to have an entire section for it.

data.to_excel('Output.xlsx')

Isn’t it beautiful. Of course you need to execute this after all the lines above.

In total.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

# print(revenue)

url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')

data = revenue.join(fcf)

# Let's reorder it
data = data.iloc[::-1].copy()

# Export to Excel
data.to_excel('Output.xlsx')

Which will result in an Excel spread sheet called Output.xlsx.

The Excel spread sheet. I added the graph.

There are many things you might find easier in Excel, like playing around with different types of visualization. On the other hand, there might be many aspects you find easier in Python. I know, I do. Almost all of them. Not kidding. Still, Excel is a powerful tool which is utilized by many specialists. Still it seems like the skills of Python are in request in connection with Excel.

Leave a Reply