Yes, you can do it manually. Copy from an HTML table and paste into an Excel spread sheet. Or you can dive into how to pull data directly from the internet into Excel. Sometimes it is not convenient, as some data needs to be transformed and you need to do it often.
In this tutorial we will show how this can be easily automated with Python using Pandas.
That is we go from data that needs to be transformed, like, $102,000 into 102000. Also, how to join (or merge) different datasources before we create a Excel spread sheet.
There are many sources where you can get this data, but Macrotrends has it nicely in a table and for more than 10 years old data.
First thing first, let’s try to take a look at the data. You can use Pandas read_html to get the data from the tables given a URL.
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)
revenue = tables[0]
print(revenue)
Where we know it is in the first table on the page. A first few lines of the output is given here.
Microsoft Annual Revenue(Millions of US $) Microsoft Annual Revenue(Millions of US $).1
0 2020 $143,015
1 2019 $125,843
2 2018 $110,360
3 2017 $96,571
4 2016 $91,154
First thing to manage are the column names and setting the year to the index.
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)
revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
print(revenue)
A first few lines.
Revenue
Year
2020 $143,015
2019 $125,843
2018 $110,360
2017 $96,571
2016 $91,154
That helped. But then we need to convert the Revenue column to integers. This is a bit tricky and can be done in various ways. We first need to remove the $-sign, then the comma-sign, before we convert it.
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
And that covers it.
We want to combine this data with the Free Cash Flow (FCF) of Microsoft.
The data can be gathered the same way and column and index can be set similar.
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')
print(fcf)
The first few lines are.
FCF
Year
2020 45234.0
2019 38260.0
2018 32252.0
2017 31378.0
2016 24982.0
All ready to be joined with the other data.
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)
revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
# print(revenue)
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')
data = revenue.join(fcf)
# Let's reorder it
data = data.iloc[::-1].copy()
Where we also reorder it, to have it from the early ears in the top. Notice the copy(), which is not strictly necessary, but makes a hard-copy of the data and not just a view.
Revenue FCF
Year
2005 39788 15793.0
2006 44282 12826.0
2007 51122 15532.0
2008 60420 18430.0
2009 58437 15918.0
Wow. Ready to export.
This is too easy to have an entire section for it.
data.to_excel('Output.xlsx')
Isn’t it beautiful. Of course you need to execute this after all the lines above.
In total.
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)
revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')
# print(revenue)
url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')
data = revenue.join(fcf)
# Let's reorder it
data = data.iloc[::-1].copy()
# Export to Excel
data.to_excel('Output.xlsx')
Which will result in an Excel spread sheet called Output.xlsx.
There are many things you might find easier in Excel, like playing around with different types of visualization. On the other hand, there might be many aspects you find easier in Python. I know, I do. Almost all of them. Not kidding. Still, Excel is a powerful tool which is utilized by many specialists. Still it seems like the skills of Python are in request in connection with Excel.
Project Description The Fibonacci sequence is as follows. 0 1 1 2 3 5 8…
How ELIZA works? It looks for simple patterns and substitutes to give the illusion of…
Project Description The program you write can do 4 things. It can show the content…
Project Description You will start to sell items from your awesome store. You count items…
Project Description Create a converter from Fahrenheit to celsius using the formula °𝐶=(°𝐹−32)×5/9 Project Prompt…
Project Description Leet (or "1337"), also known as eleet or leetspeak, is a system of…