What will we cover in this tutorial?
How to import a HTML table to Excel.
But that is easy? You can do that directly from Excel.
Yes, but what if entries contains numbers and string together, then the import will convert it to a string and makes it difficult to get the number extracted from the string.
Luckily, we will cover how to do that easy with Python.
Step 1: Get the dataset
Find your favorite HTML table online. For the purpose of this tutorial I will use this one from Wikipedia with List of Metro Systems.

Say, what if we wanted to sum how many stations are in this table (please notice that the table contains more rows than shown in the above picture).
If you import that directly into Excel, with the import functionality you will realize that the column of stations will be interpreted as strings. The problem is, that it will look like 19[13], while we are only interested in the number 19.
There is no build in functionality to do that directly in Excel.
But let’s try to import this into Python. We will use Pandas to do that. If you are new to Pandas, please see this tutorial.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)
print(tables[0].head())
Which will result in the following output.
/Users/admin/PycharmProjects/LearningSpace/venv/bin/python /Users/admin/PycharmProjects/LearningSpace/test.py
City Country ... System length Annual ridership(millions)
0 Algiers Algeria ... 18.5 km (11.5 mi)[14] 45.3 (2019)[R 1]
1 Buenos Aires Argentina ... 56.7 km (35.2 mi)[16] 337.7 (2018)[R 2]
2 Yerevan Armenia ... 13.4 km (8.3 mi)[17] 20.2 (2019)[R 3]
3 Sydney Australia ... 36 km (22 mi)[19][20] 14.2 (2019) [R 4][R Nb 1]
4 Vienna Austria ... 83.3 km (51.8 mi)[21] 459.8 (2019)[R 6]
Where we have the same problem. If we inspect the type of the columns we get the following.
City object
Country object
Name object
Yearopened object
Year of lastexpansion object
Stations object
System length object
Annual ridership(millions) object
dtype: object
Where actually all columns are of type object, which here is equivalent to a string.
Step 2: Extract the numbers from Stations and System length column
The DataStructure of the tables in tables is a DataFrame, which is Pandas main data structure.
As the strings we want to convert from string to integers are containing more information than just the numbers, we cannot use the DataFrame method to_numeric().
We want to convert something of the form 19[13] to 19.
To do that easily, we will use the apply(…) method on the DataFrame.
The apply-method takes a function as argument and applies it on each row.
We will use a lambda function as argument. If you are not familiar with lambda functions, please read this tutorial.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)
table = tables[0]
table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)
print(table[['Stations', 'System length']].head())
Which will result in the following output.
Stations System length
0 19 18.5
1 90 56.7
2 10 13.4
3 13 36.0
4 98 83.3
This is what we want.
Step 3: Export to Excel
Wow. This needs an entire step?
Well, of course it does.
Here we need to unleash the power of Pandas and use the to_excel(…) method.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_metro_systems"
tables = pd.read_html(url)
table = tables[0]
table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)
table.to_excel('output.xlsx')
This will result in an Excel file looking similar to this, where the Stations and System length columns are numeric and not string.

Learn Python

Learn Python A BEGINNERS GUIDE TO PYTHON
- 70 pages to get you started on your journey to master Python.
- How to install your setup with Anaconda.
- Written description and introduction to all concepts.
- Jupyter Notebooks prepared for 17 projects.
Python 101: A CRASH COURSE
- How to get started with this 8 hours Python 101: A CRASH COURSE.
- Best practices for learning Python.
- How to download the material to follow along and create projects.
- A chapter for each lesson with a description, code snippets for easy reference, and links to a lesson video.
Expert Data Science Blueprint

Expert Data Science Blueprint
- Master the Data Science Workflow for actionable data insights.
- How to download the material to follow along and create projects.
- A chapter to each lesson with a Description, Learning Objective, and link to the lesson video.
Machine Learning

Machine Learning – The Simple Path to Mastery
- How to get started with Machine Learning.
- How to download the material to follow along and make the projects.
- One chapter for each lesson with a Description, Learning Objectives, and link to the lesson video.