Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    From HTML Tables to Excel with Pandas: Free Cash Flow and Revenue of Microsoft

    What will we cover in this tutorial?

    Yes, you can do it manually. Copy from an HTML table and paste into an Excel spread sheet. Or you can dive into how to pull data directly from the internet into Excel. Sometimes it is not convenient, as some data needs to be transformed and you need to do it often.

    In this tutorial we will show how this can be easily automated with Python using Pandas.

    That is we go from data that needs to be transformed, like, $102,000 into 102000. Also, how to join (or merge) different datasources before we create a Excel spread sheet.

    Step 1: The first data source: Revenue of Microsoft

    There are many sources where you can get this data, but Macrotrends has it nicely in a table and for more than 10 years old data.

    First thing first, let’s try to take a look at the data. You can use Pandas read_html to get the data from the tables given a URL.

    import pandas as pd
    
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
    tables = pd.read_html(url)
    revenue = tables[0]
    print(revenue)
    

    Where we know it is in the first table on the page. A first few lines of the output is given here.

        Microsoft Annual Revenue(Millions of US $) Microsoft Annual Revenue(Millions of US $).1
    0                                         2020                                     $143,015
    1                                         2019                                     $125,843
    2                                         2018                                     $110,360
    3                                         2017                                      $96,571
    4                                         2016                                      $91,154
    

    First thing to manage are the column names and setting the year to the index.

    import pandas as pd
    
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
    tables = pd.read_html(url)
    revenue = tables[0]
    revenue.columns = ['Year', 'Revenue']
    revenue = revenue.set_index('Year')
    print(revenue)
    

    A first few lines.

          Revenue
    Year          
    2020  $143,015
    2019  $125,843
    2018  $110,360
    2017   $96,571
    2016   $91,154
    

    That helped. But then we need to convert the Revenue column to integers. This is a bit tricky and can be done in various ways. We first need to remove the $-sign, then the comma-sign, before we convert it.

    revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
    

    And that covers it.

    Step 2: Getting another data source: Free Cash Flow for Microsoft

    We want to combine this data with the Free Cash Flow (FCF) of Microsoft.

    The data can be gathered the same way and column and index can be set similar.

    import pandas as pd
    
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
    tables = pd.read_html(url)
    fcf = tables[0]
    fcf.columns = ['Year', 'FCF']
    fcf = fcf.set_index('Year')
    print(fcf)
    

    The first few lines are.

         FCF
    Year
    2020 45234.0
    2019 38260.0
    2018 32252.0
    2017 31378.0
    2016 24982.0
    

    All ready to be joined with the other data.

    import pandas as pd
    
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
    tables = pd.read_html(url)
    revenue = tables[0]
    revenue.columns = ['Year', 'Revenue']
    revenue = revenue.set_index('Year')
    revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
    # print(revenue)
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
    tables = pd.read_html(url)
    fcf = tables[0]
    fcf.columns = ['Year', 'FCF']
    fcf = fcf.set_index('Year')
    data = revenue.join(fcf)
    # Let's reorder it
    data = data.iloc[::-1].copy()
    

    Where we also reorder it, to have it from the early ears in the top. Notice the copy(), which is not strictly necessary, but makes a hard-copy of the data and not just a view.

          Revenue      FCF
    Year                  
    2005    39788  15793.0
    2006    44282  12826.0
    2007    51122  15532.0
    2008    60420  18430.0
    2009    58437  15918.0
    

    Wow. Ready to export.

    Step 3: Exporting it to Excel

    This is too easy to have an entire section for it.

    data.to_excel('Output.xlsx')
    

    Isn’t it beautiful. Of course you need to execute this after all the lines above.

    In total.

    import pandas as pd
    
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
    tables = pd.read_html(url)
    revenue = tables[0]
    revenue.columns = ['Year', 'Revenue']
    revenue = revenue.set_index('Year')
    revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
    # print(revenue)
    url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
    tables = pd.read_html(url)
    fcf = tables[0]
    fcf.columns = ['Year', 'FCF']
    fcf = fcf.set_index('Year')
    data = revenue.join(fcf)
    # Let's reorder it
    data = data.iloc[::-1].copy()
    # Export to Excel
    data.to_excel('Output.xlsx')
    

    Which will result in an Excel spread sheet called Output.xlsx.

    The Excel spread sheet. I added the graph.

    There are many things you might find easier in Excel, like playing around with different types of visualization. On the other hand, there might be many aspects you find easier in Python. I know, I do. Almost all of them. Not kidding. Still, Excel is a powerful tool which is utilized by many specialists. Still it seems like the skills of Python are in request in connection with Excel.

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment