Pandas: How to Sum Groups from HTML Tables

What will we cover in this tutorial?

  • How to collect data from a HTML table into a Pandas DataFrame.
  • The cleaning process and how to convert the data into the correct type.
  • Also, dealing with some data points that are not in correct representation.
  • Finally, how to sum up by countries.

Step 1: Collect the data from the table

Pandas is an amazing library with a lot of useful data analysis functionality right out of the box. First step in any data analysis is to collect the data. In this tutorial we will collect the data from wikipedia’s page on List of metro systems.

If you are new to the pandas library we recommend you read the this tutorial.

The objective will be to find the sums of Stations, Systems length, and Annual ridership per each country.

From wikipedia.org

At first glance this looks simple, but looking further down we see that some countries have various rows.

From wikipedia.org

Also, some rows do not have all the values needed.

First challenge first. Read the data from the table into a DataFrame, which is the main data structure of the pandas library. The read_html call from a pandas will return a list of DataFrames.

If you use read_html for the first time, we recommend you read this tutorial.

import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_metro_systems'
tables = pd.read_html(url)
table = tables[0]
print(table)

Which results in the following output (or the top of it).

                 City               Country                                Name        Yearopened Year of lastexpansion             Stations                       System length             Annual ridership(millions)
0             Algiers               Algeria                       Algiers Metro          2011[13]              2018[14]               19[14]               18.5 km (11.5 mi)[15]                       45.3 (2019)[R 1]
1        Buenos Aires             Argentina            Buenos Aires Underground        1926[Nb 1]              2019[16]               90[17]               56.7 km (35.2 mi)[17]                      337.7 (2018)[R 2]
2             Yerevan               Armenia                       Yerevan Metro          1981[18]              1996[19]               10[18]                13.4 km (8.3 mi)[18]                       18.7 (2018)[R 3]
3              Sydney             Australia                        Sydney Metro          2019[20]                     –               13[20]               36 km (22 mi)[20][21]              14.2 (2019) [R 4][R Nb 1]
4              Vienna               Austria                       Vienna U-Bahn    1976[22][Nb 2]              2017[23]               98[24]               83.3 km (51.8 mi)[22]                      463.1 (2018)[R 6]
5                Baku            Azerbaijan                          Baku Metro          1967[25]              2016[25]               25[25]               36.6 km (22.7 mi)[25]                      231.0 (2018)[R 3]

We have now have the data in a DataFrame.

Step 2: Clean and convert the data

At first glance, we see that we do not need the rows City, Name, Yearopened, Year of last expansion. To make it easier to work with the data, let’s remove them and inspect the data again.

import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_metro_systems'
tables = pd.read_html(url)
table = tables[0]
table = table.drop(['City', 'Name', 'Yearopened', 'Year of lastexpansion'], axis=1)
print(table)

Which result in the following output.

                  Country             Stations                       System length             Annual ridership(millions)
0                 Algeria               19[14]               18.5 km (11.5 mi)[15]                       45.3 (2019)[R 1]
1               Argentina               90[17]               56.7 km (35.2 mi)[17]                      337.7 (2018)[R 2]
2                 Armenia               10[18]                13.4 km (8.3 mi)[18]                       18.7 (2018)[R 3]
3               Australia               13[20]               36 km (22 mi)[20][21]              14.2 (2019) [R 4][R Nb 1]
4                 Austria               98[24]               83.3 km (51.8 mi)[22]                      463.1 (2018)[R 6]
5              Azerbaijan               25[25]               36.6 km (22.7 mi)[25]                      231.0 (2018)[R 3]
6                 Belarus               29[27]               37.3 km (23.2 mi)[27]                      283.4 (2018)[R 3]
7                 Belgium         59[28][Nb 5]               39.9 km (24.8 mi)[29]                      165.3 (2019)[R 7]

This makes it easier to see the next steps.

Let’s take them one by one. Stations need to remove the data after ‘[‘-symbol and convert the number to an integer. This can be done by using a lambda function to a row.

table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)

If you are new to lambda functions we recommend you read this tutorial.

The next thing we need to do is to convert the System length to floats. The length will be in km (I live in Denmark, where we use km and not mi). This can also be done by using a lambda function

table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)

Finally, and a bit more tricky, we need to convert the column of Annual ridership. The challenge is that lines have n/a which are converted to np.nan, but there are also some lines where the input is not easy to convert, as the images show.

From wikipedia.org
From wikipedia.org

These lines are can be dealt with by using a helper function.

def to_float(obj):
    try:
        return float(obj)
    except:
        return np.nan

index = 'Annual ridership(millions)'
table[index] = table.apply(lambda row: to_float(row[index].split()[0]) if row[index] is not np.nan else np.nan, axis=1)

Adding this all together we get the following code.

import pandas as pd
import numpy as np

def to_float(obj):
    try:
        return float(obj)
    except:
        return np.nan

url = 'https://en.wikipedia.org/wiki/List_of_metro_systems'
tables = pd.read_html(url)
table = tables[0]
table = table.drop(['City', 'Name', 'Yearopened', 'Year of lastexpansion'], axis=1)

table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)
index = 'Annual ridership(millions)'
table[index] = table.apply(lambda row: to_float(row[index].split()[0]) if row[index] is not np.nan else np.nan, axis=1)

print(table)

Which results in the following output (or the first few lines).

                  Country  Stations  System length  Annual ridership(millions)
0                 Algeria        19          18.50                       45.30
1               Argentina        90          56.70                      337.70
2                 Armenia        10          13.40                       18.70
3               Australia        13          36.00                       14.20
4                 Austria        98          83.30                      463.10
5              Azerbaijan        25          36.60                      231.00
6                 Belarus        29          37.30                      283.40
7                 Belgium        59          39.90                      165.30
8                  Brazil        19          28.10                       58.40
9                  Brazil        25          42.40                       42.80
10                 Brazil        22          43.80                       51.70

Step 3: Sum rows by country

Say, now we want to get the country with the most metro stations. This can be achieved by using the groupby and sum function from the pandas DataFrame data structure.

import pandas as pd
import numpy as np

def to_float(obj):
    try:
        return float(obj)
    except:
        return np.nan

url = 'https://en.wikipedia.org/wiki/List_of_metro_systems'
tables = pd.read_html(url)
table = tables[0]
table = table.drop(['City', 'Name', 'Yearopened', 'Year of lastexpansion'], axis=1)

table['Stations'] = table.apply(lambda row: int(row['Stations'].split('[')[0]), axis=1)
table['System length'] = table.apply(lambda row: float(row['System length'].split()[0]), axis=1)
index = 'Annual ridership(millions)'
table[index] = table.apply(lambda row: to_float(row[index].split()[0]) if row[index] is not np.nan else np.nan, axis=1)

# Sum up
table_sum = table.groupby(['Country']).sum()

print(table_sum.sort_values(['Stations'], ascending=False))

Where the result will be China.

                      Stations  System length  Annual ridership(millions)
Country                                                                  
China                     3738        6312.16                    25519.23
United States             1005        1325.90                     2771.50
South Korea                714         839.90                     4054.90
Japan[Nb 34]               669         791.20                     6489.60
India                      499         675.97                     1377.00
France                     483         350.90                     2113.50
Spain                      438         474.40                     1197.90

If we want to sort by km of System length, you will only need to change the last line to the following.

print(table_sum.sort_values(['System length'], ascending=False))

Resulting in the following.

                      Stations  System length  Annual ridership(millions)
Country                                                                  
China                     3738        6312.16                    25519.23
United States             1005        1325.90                     2771.50
South Korea                714         839.90                     4054.90
Japan[Nb 34]               669         791.20                     6489.60
India                      499         675.97                     1377.00
Russia                     368         611.50                     3507.60
United Kingdom             390         523.90                     1555.30

Finally, if you want it by Annual ridership, you will need to change the last line to.

print(table_sum.sort_values([index], ascending=False))

Remember, we assigned that to index. You should get the following output.

                      Stations  System length  Annual ridership(millions)
Country                                                                  
China                     3738        6312.16                    25519.23
Japan[Nb 34]               669         791.20                     6489.60
South Korea                714         839.90                     4054.90
Russia                     368         611.50                     3507.60
United States             1005        1325.90                     2771.50
France                     483         350.90                     2113.50
Brazil                     243         345.40                     2106.20

Master Data Correlation with Pandas DataFrame in 3 Easy Steps

What will we cover in this tutorial?

  • How to get data using Pandas DataFrames.
  • Clean the data and merge it together.
  • Finally, how to see if there is any correlation between data columns.

Step 1: Get the data you want to correlate

As an example, let’s assume you get the idea that there might be a correlation between GDP per capita, Social Progress Index (SPI), and Human Development Index (HDI), but is not sure whether SPI or HDI is closets correlated to GDP per capita.

Luckily, you have pandas to the rescue.

As the data is in three pages, you need to collect it by separately and merge it later. First, let us collect the data and inspect it.

The GDP per capita is located in the table on wikipedia presented in the picture below.

From wikipedia.org

Which is actually three tables. We will use the World Bank table in our example. It can be collected by using a call to pandas read_html. If you are new to read_html we recommend you read this tutorial.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

table = tables[3]
print(table)

Which will give an output similar to this.

    Rank                       Country/Territory     US$
0      1                           Monaco (2018)  185741
1      2                    Liechtenstein (2017)  173356
2      3                              Luxembourg  114705
3      —                                   Macau   84096
4      4                             Switzerland   81994
5      5                                 Ireland   78661
6      6                                  Norway   75420
7      7                                 Iceland   66945

The next table we need to get is the Social Progress Index (SPI) and looks like the picture shows below.

From wikipedia.org

This can be collected to a DataFrame with the following code.

import pandas as pd

url = 'https://en.wikipedia.org/wiki/Social_Progress_Index'
tables = pd.read_html(url)
print(tables[1])

Which will print the following to the screen (or the top of it).

                          Country 2019[9]               
                          Country    Rank  Score Score.1
0                          Norway       1  90.95     NaN
1                         Denmark       2  90.09     NaN
2                     Switzerland       3  89.89     NaN
3                         Finland       4  89.56     NaN
4                          Sweden       5  89.45     NaN

Finally we need to read the Human Development Index (HDI), which can be seen on wikipedia as the following picture shows.

From wikipedia.org

And can be collected with the following code.

import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index'
tables = pd.read_html(url)

print(tables[1])

Resulting in the following output.

                           Rank                                             Country or Territory                         HDI                                                    Unnamed: 5_level_0
    2018 data (2019 report)​[5] Change from previous year​[5]               Country or Territory 2018 data (2019 report)​[5] Average annual HDI growth (2010–2018)​[6]          Unnamed: 5_level_1
    Very high human development   Very high human development        Very high human development Very high human development               Very high human development Very high human development
0                             1                           NaN                             Norway                       0.954                                     0.16%                         NaN
1                             2                           NaN                        Switzerland                       0.946                                     0.18%                         NaN
2                             3                           NaN                            Ireland                       0.942                                     0.71%                         NaN
3                             4                           NaN                            Germany                       0.939                                     0.25%                         NaN
4                             4                           (2)                          Hong Kong                       0.939                                     0.51%                         NaN
5                             6                           (1)                          Australia                       0.938                                     0.17%                         NaN

A bit more messy data table.

Now we have gathered all the data we need to clean it up and merge it together.

Step 2: Clean and merge the data into one DataFrame

If we first inspect the data from the GDP per capita.

    Rank                       Country/Territory     US$
0      1                           Monaco (2018)  185741
1      2                    Liechtenstein (2017)  173356
2      3                              Luxembourg  114705
3      —                                   Macau   84096
4      4                             Switzerland   81994
5      5                                 Ireland   78661
6      6                                  Norway   75420
7      7                                 Iceland   66945

Notice that Country/Territory can have a year in parentheses, which will make it hard to merge. We need to clean that up. Also, we do not need the Rank column.

If we inspect the data of HDI.

                          Country 2019[9]               
                          Country    Rank  Score Score.1
0                          Norway       1  90.95     NaN
1                         Denmark       2  90.09     NaN
2                     Switzerland       3  89.89     NaN
3                         Finland       4  89.56     NaN
4                          Sweden       5  89.45     NaN

Here we notice that the first row is an additional description row, which we can remove. Further, we do not need the Rank and Score.1 columns.

Let’s try to merge it together. Notice that we use a lambda function to clean up the Country/Territory names. If you are new to lambda functions, we recommend you read this tutorial.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in table 3
table = tables[3]

# We need to clean the years in parenthesis from the country/territory field
table['Country'] = table.apply(lambda row: row['Country/Territory'].split(' (')[0], axis=1)
# We do not need the Rank and Country/Territory for more data
table = table.drop(['Rank', 'Country/Territory'], axis=1)

url = 'https://en.wikipedia.org/wiki/Social_Progress_Index'
tables = pd.read_html(url)

merge_table = tables[1]
# The first level of the table can be dropped
merge_table.columns = merge_table.columns.droplevel(0)
# We do not need the Rank and Score.1 columns
merge_table = merge_table.drop(['Rank', 'Score.1'], axis=1)
# Need to rename the second column to SPI = Social Progress Index
merge_table.columns = ['Country', 'SPI']

# Ready to merge the tables
table = table.merge(merge_table, how="left", left_on=['Country'], right_on=['Country'])

print(table)

Which will result in an output like this.

        US$                            Country    SPI
0    185741                             Monaco    NaN
1    173356                      Liechtenstein    NaN
2    114705                         Luxembourg  87.66
3     84096                              Macau    NaN
4     81994                        Switzerland  89.89
5     78661                            Ireland  87.97
6     75420                             Norway  90.95

First validate that Monaco, Liechtenstein, and Macau do not have any SPI value. That seems to be correct.

Then we can proceed to the next table of HDI. Let us first inspect the data.

                           Rank                                             Country or Territory                         HDI                                                    Unnamed: 5_level_0
    2018 data (2019 report)​[5] Change from previous year​[5]               Country or Territory 2018 data (2019 report)​[5] Average annual HDI growth (2010–2018)​[6]          Unnamed: 5_level_1
    Very high human development   Very high human development        Very high human development Very high human development               Very high human development Very high human development
0                             1                           NaN                             Norway                       0.954                                     0.16%                         NaN
1                             2                           NaN                        Switzerland                       0.946                                     0.18%                         NaN
2                             3                           NaN                            Ireland                       0.942                                     0.71%                         NaN
3                             4                           NaN                            Germany                       0.939                                     0.25%                         NaN
4                             4                           (2)                          Hong Kong                       0.939                                     0.51%                         NaN
5                             6                           (1)                          Australia                       0.938                                     0.17%                         NaN

It has a quite messy top level column naming in 3 layers. Dropping them will make some identical. To deal with that, we can rename them and delete those we do not need.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in table 3
table = tables[3]

# We need to clean the years in parenthesis from the country/territory field
table['Country'] = table.apply(lambda row: row['Country/Territory'].split(' (')[0], axis=1)
# We do not need the Rank and Country/Territory for more data
table = table.drop(['Rank', 'Country/Territory'], axis=1)

url = 'https://en.wikipedia.org/wiki/Social_Progress_Index'
tables = pd.read_html(url)

merge_table = tables[1]
# The first level of the table can be dropped
merge_table.columns = merge_table.columns.droplevel(0)
# We do not need the Rank and Score.1 columns
merge_table = merge_table.drop(['Rank', 'Score.1'], axis=1)
# Need to rename the second column to SPI = Social Progress Index
merge_table.columns = ['Country', 'SPI']

# Ready to merge the tables
table = table.merge(merge_table, how="left", left_on=['Country'], right_on=['Country'])

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index'
tables = pd.read_html(url)


merge_table = tables[1]
# Delete the additional column levels
merge_table.columns = merge_table.columns.droplevel(1)
merge_table.columns = merge_table.columns.droplevel(1)
# Rename the columns
merge_table.columns = ['Rank1', 'Rank2', 'Country', 'HDI', 'HDI-1', 'None']
# Delete the columns we do not need
merge_table = merge_table.drop(['Rank1', 'Rank2', 'HDI-1', 'None'], axis=1)
# Merge the tables
table = table.merge(merge_table, how="left", left_on=['Country'], right_on=['Country'])

print(table)

Which will result in the following output (or the top of it).

        US$                            Country    SPI    HDI
0    185741                             Monaco    NaN    NaN
1    173356                      Liechtenstein    NaN  0.917
2    114705                         Luxembourg  87.66  0.909
3     84096                              Macau    NaN    NaN
4     81994                        Switzerland  89.89  0.946
5     78661                            Ireland  87.97  0.942
6     75420                             Norway  90.95  0.954

Notice, that here Liechtenstein has HDI data, while Monaco and Macau do not have any data. While it is not visible, the HDI column is not made of float. It can be seen with a call to table.dtypes, which will output the following.

US$          int64
Country     object
SPI        float64
HDI         object
dtype: object

Which states that HDI is object, which in this case is a string. That means we need to convert it to float to make our final correlation computations. This can be done by using a lambda function.

table['HDI'] = table.apply(lambda row: float(row['HDI']) if row['HDI'] is not np.nan else np.nan, axis=1) # HDI = Human Development Index

This actually makes the data ready to see if there is any correlations between GDP per capita and SPI and/or HDI.

Step 3: Calculate the correlations

This is where the DataFrames from pandas come strong. It can do the entire work for you with one call to corr().

The full code is given below.

import pandas as pd
import numpy as np


pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 1000)


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in table 3
table = tables[3]

# We need to clean the years in parenthesis from the country/territory field
table['Country'] = table.apply(lambda row: row['Country/Territory'].split(' (')[0], axis=1)
# We do not need the Rank and Country/Territory for more data
table = table.drop(['Rank', 'Country/Territory'], axis=1)

url = 'https://en.wikipedia.org/wiki/Social_Progress_Index'
tables = pd.read_html(url)

merge_table = tables[1]
# The first level of the table can be dropped
merge_table.columns = merge_table.columns.droplevel(0)
# We do not need the Rank and Score.1 columns
merge_table = merge_table.drop(['Rank', 'Score.1'], axis=1)
# Need to rename the second column to SPI = Social Progress Index
merge_table.columns = ['Country', 'SPI']

# Ready to merge the tables
table = table.merge(merge_table, how="left", left_on=['Country'], right_on=['Country'])

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index'
tables = pd.read_html(url)


merge_table = tables[1]
# Delete the additional column levels
merge_table.columns = merge_table.columns.droplevel(1)
merge_table.columns = merge_table.columns.droplevel(1)
# Rename the columns
merge_table.columns = ['Rank1', 'Rank2', 'Country', 'HDI', 'HDI-1', 'None']
# Delete the columns we do not need
merge_table = merge_table.drop(['Rank1', 'Rank2', 'HDI-1', 'None'], axis=1)
# Merge the tables
table = table.merge(merge_table, how="left", left_on=['Country'], right_on=['Country'])

# Convert to floats
table['HDI'] = table.apply(lambda row: float(row['HDI']) if row['HDI'] is not np.nan else np.nan, axis=1) # HDI = Human Development Index

# Calculate the correlation
table_corr = table.corr()

# Print the correlation to GDP per capita (stored in US$).
print(table_corr['US$'].sort_values(ascending=False))

Which will result in the following output.

US$    1.000000
SPI    0.713946
HDI    0.663183
Name: US$, dtype: float64

Hence, it seems that there is the biggest correlation between GDP per capita and SPI.

Notice, that the calculations ignores all Not a Number (np.nan).

How to Create Choropleth Maps with Evenly Distributed Colors in 3 Easy Steps

What will we cover in the tutorial

  • You create your default map using Choropleth from folium (or any other) and the color distribution is poor. Most countries are distributed between two colors, which makes the map less informative.
Example of poor color distribution.
  • Understand the issue
  • How to solve it
  • …and putting it all together.

Step 1: Understand the issue

In this example we have used the divorce rates in reported in various countries on wikipedia’s page of Divorce Demography

At first inspection on the wikipedia page you get an idea of what the problem is.

From wikipedia.org

The divorce rate in Denmark is too high (just kidding, I am from Denmark and not proud that Denmark is ranking number 6 based on percentage of marriages that end in divorce).

The issue is, that the distribution is not even. See the highest is Tunisia with 97.14 percent, then Portugal is second with 70.97 percent. This has to be compared to the color coding. The last two colors are by default distributed to 67 to 82 and 82 to 97, which only contain 1 country each.

To inspect this further, we need to retrieve the data and inspect it.

To retrieve the data we can use pandas – read this tutorial for details or see the code below.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

print(pd.cut(table[index], 6).value_counts(sort=False))

If we inspect the output we see that our suggestion was right.

(6.93, 22.04]     25
(22.04, 37.06]    26
(37.06, 52.08]    22
(52.08, 67.1]      9
(67.1, 82.12]      1
(82.12, 97.14]     1
Name: Divorce_float, dtype: int64
The last two color codes are only used by one country each, while the first 3 are used by 20+ countries.

Step 2: Distribute the countries into evenly distributed bins

This requires to understand the difference between cut and qcut of pandas library.

  • cut By default will return the same size bins.
  • qcut Will by default try to return buckets with the same number of items in.

See this example to understand it better.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

print(pd.qcut(table[index], 6).value_counts(sort=False))

Where the only difference is that we changed cut to qcut on the last line. This will result in the following output.

(7.018999999999999, 17.303]    14
(17.303, 23.957]               14
(23.957, 31.965]               14
(31.965, 40.0]                 15
(40.0, 47.078]                 13
(47.078, 97.14]                14
Name: Divorce_float, dtype: int64

Where we see that each bucket now contains approximately the same number of countries.

Hence, we need to use that for our purpose of color distribution our map.

Step 3: Putting it all together on the map

If you are new to folium and how make awesome leaflet maps easy, I can recommend to read this tutorial, or inspect the code below.

import pandas as pd
import folium
import geopandas
import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

# We should clean the data
table['Country'] = table.apply(lambda row: row['Country/region'].split(' (')[0] if type(row['Country/region']) == str else row['Country/region'], axis=1)

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Replace United States of America to United States to fit the naming in the table
world = world.replace('United States of America', 'United States')

# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

# Clean data: remove rows with no data
table = table.dropna(subset=[index])

# We have 10 colors available resulting into 9 cuts.
bins_data = pd.qcut(table[index], 9).value_counts(sort=False)
print(bins_data)

bins = [0]
for i in range(9):
    bins.append(int(round(bins_data.index.values[i].right)))
bins[9] = 100

# Create a map
my_map = folium.Map()

# Add the data
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=['Country', index],
    key_on='feature.properties.name',
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name=index,
    threshold_scale=bins
).add_to(my_map)
my_map.save('divorse_rates.html')

Where we combine the two DataFrames and take advantage of that we have 10 colors available.

It should result in a map like this one.

Final output

Plot World Data to Map Using Python in 3 Easy Steps

What will we cover in this tutorial

  • As example we will use the html table from a wikipedia page. In this case the one listing countries by meat consumption.
  • We will see how to read the table data into a Pandas DataFrame with a single call.
  • Then how to merge it with a DataFrame containing data to color countries.
  • Finally, how to add the colors to leaflet map using a Python library.

Step 1: Read the data to a Pandas DataFrame

We need to inspect the page we are going to parse from. In this case it is the world meat consumption from wikipedia.

From wikipedia.

What we want to do is to gather the data from the table and plot it to a world map using colors to indicate the meat consumption.

End result

The easiest way to work with data is by using pandas DataFrames. The Pandas library has a read_html function, which returns all tables from a webpage.

This can be achieved by the following code. If you use read_html for the first time, you will need to instal lxml, see this tutorial for details.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]

print(table.head())

Resulting in the following output.

               Country  Kg/person (2002)[9][note 1] Kg/person (2009)[10]
0              Albania                         38.2                  NaN
1              Algeria                         18.3                 19.5
2       American Samoa                         24.9                 26.8
3               Angola                         19.0                 22.4
4  Antigua and Barbuda                         56.0                 84.3

Step 2: Merging the data to world map

The next step thing we want to do is to map it to a world map that we can color.

This can be done by using geopandas.

import pandas as pd
import geopandas


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]

print(table.head())

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

print(world.head())

Which results in the following output.

               Country  Kg/person (2002)[9][note 1] Kg/person (2009)[10]
0              Albania                         38.2                  NaN
1              Algeria                         18.3                 19.5
2       American Samoa                         24.9                 26.8
3               Angola                         19.0                 22.4
4  Antigua and Barbuda                         56.0                 84.3
     pop_est      continent                      name iso_a3  gdp_md_est                                           geometry
0     920938        Oceania                      Fiji    FJI      8374.0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1   53950935         Africa                  Tanzania    TZA    150600.0  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2     603253         Africa                 W. Sahara    ESH       906.5  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3   35623680  North America                    Canada    CAN   1674000.0  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4  326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Where we can see the column Country of the table DataFrame should be merged with the column name in the world DataFrame.

Let’s do the merge on that.

import pandas as pd
import geopandas


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])

print(table.head())

Which results in the following output.

     pop_est      continent  ... kg/person (2009)[10] kg/person (2017)[11]
0     920938        Oceania  ...                 38.8                  NaN
1   53950935         Africa  ...                  9.6                 6.82
2     603253         Africa  ...                  NaN                  NaN
3   35623680  North America  ...                 94.3                69.99
4  326625791  North America  ...                120.2                98.60

[5 rows x 10 columns]

Where we also notice that some rows do not have any data from table, resulting in values NaN. To get a clearer view we will remove those rows.

import pandas as pd
import geopandas


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])

# Clean data: remove rows with no data
table = table.dropna(subset=['kg/person (2002)[9][note 1]'])

The rows can be removed by using dropna.

Step 3: Add the data by colors on an interactive world map

Finally, you can use folium to create a leaflet map.

import pandas as pd
import folium
import geopandas


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])

# Clean data: remove rows with no data
table = table.dropna(subset=['kg/person (2002)[9][note 1]'])

# Create a map
my_map = folium.Map()

# Add the data
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=['Country', 'kg/person (2002)[9][note 1]'],
    key_on='feature.properties.name',
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Meat consumption in kg/person'
).add_to(my_map)
my_map.save('meat.html')

Resulting a html webpage like this one.

From HTML Table Through Pandas to Leaflet Map in 5 Steps

What will we cover

  • You want to map data from an HTML table to an interactive map.
  • The data is not clean and it is difficult to map data to countries, as they are often called different.

Step 1: Using Pandas to read the data

We will look at a table of data from wikipedia.org. In this example we will look at the data from average human heights by country.

From wikipedia.org

Inspecting the first few columns you see a few issues already. There is some data missing and some countries are represented more than once.

To simplify our exercise we will only look at Average male height.

Let’s use pandas to read the content and inspect it. If you are new to pandas I can recommend the this post.

To read the content you can use the read_html(url) call from the pandas library. You need to instal lxml as well, see this post of details.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Average_human_height_by_country'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table
table = tables[0]

print(table[:20])

Which will result in the following output.

            Country/Region        Average male height  ...       Year    Source
0                  Albania   174.0 cm (5 ft 8 1⁄2 in)  ...  2008–2009  [11][12]
1                Argentina                        NaN  ...  2004–2005      [13]
2                Argentina  174.46 cm (5 ft 8 1⁄2 in)  ...  1998–2001      [14]
3                  Armenia                        NaN  ...       2005      [15]
4                Australia       175.6 cm (5 ft 9 in)  ...  2011–2012      [16]
5                  Austria    179 cm (5 ft 10 1⁄2 in)  ...       2006      [17]
6               Azerbaijan   171.8 cm (5 ft 7 1⁄2 in)  ...       2005      [18]
7                  Bahrain       165.1 cm (5 ft 5 in)  ...       2002      [19]
8                  Bahrain   171.0 cm (5 ft 7 1⁄2 in)  ...       2009  [20][21]
9               Bangladesh                        NaN  ...       2007      [15]
10          Country/Region        Average male height  ...       Year    Source
11                 Belgium  178.6 cm (5 ft 10 1⁄2 in)  ...       2001      [22]
12                   Benin                        NaN  ...       2006      [15]
13                 Bolivia                        NaN  ...       2003      [15]
14                 Bolivia       160.0 cm (5 ft 3 in)  ...       1970      [23]
15  Bosnia and Herzegovina       183.9 cm (6 ft 0 in)  ...       2014      [24]
16                  Brazil       170.7 cm (5 ft 7 in)  ...       2009  [25][26]
17          Brazil – Urban   173.5 cm (5 ft 8 1⁄2 in)  ...       2009      [25]
18          Brazil – Rural   170.9 cm (5 ft 7 1⁄2 in)  ...       2009      [25]
19                Bulgaria       175.2 cm (5 ft 9 in)  ...       2010      [27]

Where you by inspection of line 10 see a line of input that needs to be cleaned.

Step 2: Some basic cleaning of the data

By inspection of the data you see that every 10 lines (or something) an line repeats the column names.

From wikipedia.org

While this is practical if you inspect the data as a user, this seems to be annoying for us when we want to use the raw data.

Luckily this is easy to clean up using pandas.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Average_human_height_by_country'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table
table = tables[0]

# To avoid writing it all the time
AVG_MH = 'Average male height'
# Remove duplicate rows with 'Average male height'
table = table.loc[table[AVG_MH] != AVG_MH].copy()

print(table[:20])

Where you can see the data is has cleaned up these columns.

            Country/Region        Average male height  ...       Year    Source
0                  Albania   174.0 cm (5 ft 8 1⁄2 in)  ...  2008–2009  [11][12]
1                Argentina                        NaN  ...  2004–2005      [13]
2                Argentina  174.46 cm (5 ft 8 1⁄2 in)  ...  1998–2001      [14]
3                  Armenia                        NaN  ...       2005      [15]
4                Australia       175.6 cm (5 ft 9 in)  ...  2011–2012      [16]
5                  Austria    179 cm (5 ft 10 1⁄2 in)  ...       2006      [17]
6               Azerbaijan   171.8 cm (5 ft 7 1⁄2 in)  ...       2005      [18]
7                  Bahrain       165.1 cm (5 ft 5 in)  ...       2002      [19]
8                  Bahrain   171.0 cm (5 ft 7 1⁄2 in)  ...       2009  [20][21]
9               Bangladesh                        NaN  ...       2007      [15]
11                 Belgium  178.6 cm (5 ft 10 1⁄2 in)  ...       2001      [22]
12                   Benin                        NaN  ...       2006      [15]
13                 Bolivia                        NaN  ...       2003      [15]
14                 Bolivia       160.0 cm (5 ft 3 in)  ...       1970      [23]
15  Bosnia and Herzegovina       183.9 cm (6 ft 0 in)  ...       2014      [24]
16                  Brazil       170.7 cm (5 ft 7 in)  ...       2009  [25][26]
17          Brazil – Urban   173.5 cm (5 ft 8 1⁄2 in)  ...       2009      [25]
18          Brazil – Rural   170.9 cm (5 ft 7 1⁄2 in)  ...       2009      [25]
19                Bulgaria       175.2 cm (5 ft 9 in)  ...       2010      [27]
20            Burkina Faso                        NaN  ...       2003      [15]

Step 3: Convert data to floats

Inspecting the data that we need (Average male height) it is represented as a string with both the cm and ft/in figure. As I live in Denmark and we use the metric system and have never really understood any benefit of the US customary units (feel free to enlighten me).

Hence, we want to convert the strings in the column Average male height to a float representing the height in cm.

Notice, that some are NaN, while the rest are having the first number as the length in cm.

We can exploit that and convert it with a lambda function. If you are new to lambda functions you can see this tutorial.

import pandas as pd
import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Average_human_height_by_country'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table
table = tables[0]

# To avoid writing it all the time
AVG_MH = 'Average male height'
AMH_F = 'Aveage male height (float)'

# Remove duplicate rows with 'Average male height'
table = table.loc[table[AVG_MH] != AVG_MH].copy()

# Clean up data to have height in cm
table[AMH_F] = table.apply(lambda row: float(row[AVG_MH].split(' ')[0]) if row[AVG_MH] is not np.nan else np.nan,
                           axis=1)
print(table[:20])

Resulting in the following.

            Country/Region  ... Aveage male height (float)
0                  Albania  ...                     174.00
1                Argentina  ...                        NaN
2                Argentina  ...                     174.46
3                  Armenia  ...                        NaN
4                Australia  ...                     175.60
5                  Austria  ...                     179.00
6               Azerbaijan  ...                     171.80
7                  Bahrain  ...                     165.10
8                  Bahrain  ...                     171.00
9               Bangladesh  ...                        NaN
11                 Belgium  ...                     178.60
12                   Benin  ...                        NaN
13                 Bolivia  ...                        NaN
14                 Bolivia  ...                     160.00
15  Bosnia and Herzegovina  ...                     183.90
16                  Brazil  ...                     170.70
17          Brazil – Urban  ...                     173.50
18          Brazil – Rural  ...                     170.90
19                Bulgaria  ...                     175.20
20            Burkina Faso  ...                        NaN

Notice that np.nan is also a float and hence, the full column Average male height (float) are floats.

Step 4: Merge two sets of data with different representations of countries

To make the map in the end we will use the geopandas library, which has a nice low resolution dataset used to color countries. While the data by geopandas is represented as a DataFrame it is difficult to merge it as the DataFrame we have created from the htm_read call to pandas has varying names.

Example can be United States in the one we created and United States of America in the geopandas. Hence, we need some means to map them to the same representation.

For this purpose we can use the library pycountry.

Hence, applying that to both DataFrames we can merge them.

import pandas as pd
import numpy as np
import geopandas
import pycountry


# Helper function to map country names to alpha_3 representation - though some are not known by library
def lookup_country_code(country):
    try:
        return pycountry.countries.lookup(country).alpha_3
    except LookupError:
        return country


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Average_human_height_by_country'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table
table = tables[0]

# To avoid writing it all the time
AVG_MH = 'Average male height'
CR = 'Country/Region'
COUNTRY = 'Country'
AMH_F = 'Aveage male height (float)'
A3 = 'alpha3'

# Remove duplicate rows with 'Average male height'
table = table.loc[table[AVG_MH] != AVG_MH].copy()

# Clean up data to have height in cm
table[AMH_F] = table.apply(lambda row: float(row[AVG_MH].split(' ')[0]) if row[AVG_MH] is not np.nan else np.nan,
                           axis=1)

# Clean up the names if used a dash before
table[COUNTRY] = table.apply(
    lambda row: row[CR].split(' – ')[0] if ' – ' in row[CR] else row[CR],
    axis=1)
# Map the country name to the alpha3 representation
table[A3] = table.apply(lambda row: lookup_country_code(row[COUNTRY]), axis=1)

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Do the same mapping to alpha3
world[A3] = world.apply(lambda row: lookup_country_code(row['name']), axis=1)

# Merge the data
table = world.merge(table, how="left", left_on=[A3], right_on=[A3])

# Remove countries with no data
table = table.dropna(subset=[AMH_F])

# These lines are just used to get the full data
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 1000)
print(table)

Which will result in the following.

        pop_est      continent                      name iso_a3  gdp_md_est                                           geometry       alpha3                                 Country/Region         Average male height      Average female height Stature ratio(male to female)                      Sample population / age range Share ofpop. over 18covered[9][10]    Methodology       Year           Source  Aveage male height (float)               Country
3      35623680  North America                    Canada    CAN   1674000.0  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...          CAN                                         Canada        175.1 cm (5 ft 9 in)       162.3 cm (5 ft 4 in)                          1.08                                              18–79                                 94.7%       Measured  2007–2009             [29]                      175.10                Canada
4     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA                                  United States        175.3 cm (5 ft 9 in)   161.5 cm (5 ft 3 1⁄2 in)                          1.09  All Americans, 20+ (N= m:5,232 f:5,547, Median...                                   69%       Measured  2011–2014            [132]                      175.30         United States
5     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA              United States – African Americans        175.5 cm (5 ft 9 in)       162.6 cm (5 ft 4 in)                          1.08  African Americans, 20–39 (N= m:532 f:612, Medi...                             3.4%[133]       Measured  2015-2016            [134]                      175.50         United States
6     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA  United States – Hispanic and Latino Americans    169.5 cm (5 ft 6 1⁄2 in)   156.7 cm (5 ft 1 1⁄2 in)                          1.08  Hispanic/Latin-Americans, 20–39 (N= m:745 f:91...                             4.4%[133]       Measured  2015–2016            [134]                      169.50         United States
7     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA              United States – Mexican Americans    168.8 cm (5 ft 6 1⁄2 in)   156.1 cm (5 ft 1 1⁄2 in)                          1.09  Mexican Americans, 20–39 (N= m:429 f:511, Medi...                             2.8%[133]       Measured  2015–2016            [134]                      168.80         United States
8     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA                United States – Asian Americans        169.7 cm (5 ft 7 in)   156.2 cm (5 ft 1 1⁄2 in)                          1.09  Non-Hispanic Asians, 20–39 (N= m:323 f:326, Me...                             1.3%[133]       Measured  2015–2016            [134]                      169.70         United States
9     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...          USA            United States – Non-Hispanic whites    177.0 cm (5 ft 9 1⁄2 in)   163.3 cm (5 ft 4 1⁄2 in)                          1.08  Non-Hispanic White Americans, 20–39 (N= m:892 ...                            17.1%[133]       Measured  2015–2016            [134]                      177.00         United States
13    260580739           Asia                 Indonesia    IDN   3028000.0  MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ...          IDN                                      Indonesia          158 cm (5 ft 2 in)        147 cm (4 ft 10 in)                          1.07  50+ (N= m:2,041 f:2,396, Median= m:158 cm (5 f...                                 22.5%  Self-reported       1997             [59]                      158.00             Indonesia
15     44293293  South America                 Argentina    ARG    879400.0  MULTIPOLYGON (((-68.63401 -52.63637, -68.25000...          ARG                                      Argentina   174.46 cm (5 ft 8 1⁄2 in)  161.01 cm (5 ft 3 1⁄2 in)                          1.08  Healthy, 18 (N= m:90 f:97, SD= m:7.43 cm (3 in...                                  2.9%       Measured  1998–2001             [14]                      174.46             Argentina
16     17789267  South America                     Chile    CHL    436100.0  MULTIPOLYGON (((-68.63401 -52.63637, -68.63335...          CHL                                          Chile        169.6 cm (5 ft 7 in)   156.1 cm (5 ft 1 1⁄2 in)                          1.09                                                15+                                107.2%       Measured  2009–2010             [30]                      169.60                 Chile
19     47615739         Africa                     Kenya    KEN    152700.0  POLYGON ((39.20222 -4.67677, 37.76690 -3.67712...          KEN                                          Kenya        169.6 cm (5 ft 7 in)                        NaN                           NaN        25–49 (N= f:1,600, SD= f:6.3 cm (2 1⁄2 in))                                 53.7%        Summary       2016             [69]                      169.60                 Kenya
20     47615739         Africa                     Kenya    KEN    152700.0  POLYGON ((39.20222 -4.67677, 37.76690 -3.67712...          KEN                                          Kenya        169.6 cm (5 ft 7 in)   158.2 cm (5 ft 2 1⁄2 in)                           NaN            25–49 (N= f:4,856, SD= f:7.3 cm (3 in))                                 52.5%         Survey       2016         [15][69]                      169.60                 Kenya
25    142257519         Europe                    Russia    RUS   3745000.0  MULTIPOLYGON (((178.72530 71.09880, 180.00000 ...       Russia                                         Russia    171.1 cm (5 ft 7 1⁄2 in)   158.2 cm (5 ft 2 1⁄2 in)                          1.08                         44-69 (N= m: 3892 f: 4643)                                 38.5%       Measured       2007             [93]                      171.10                Russia
26    142257519         Europe                    Russia    RUS   3745000.0  MULTIPOLYGON (((178.72530 71.09880, 180.00000 ...       Russia                                         Russia       177.2 cm (5 ft 10 in)   164.1 cm (5 ft 4 1⁄2 in)                          1.08                                                 24                                  1.9%       Measured       2004         [21][98]                      177.20                Russia
29      5320045         Europe                    Norway    -99    364700.0  MULTIPOLYGON (((15.14282 79.67431, 15.52255 80...          NOR                                         Norway   179.7 cm (5 ft 10 1⁄2 in)       167.1 cm (5 ft 6 in)                          1.09           Conscripts, 18–44 (N= m:30,884 f:28,796)                                 35.3%       Measured       2012             [88]                      179.70                Norway
30      5320045         Europe                    Norway    -99    364700.0  MULTIPOLYGON (((15.14282 79.67431, 15.52255 80...          NOR                                         Norway   179.7 cm (5 ft 10 1⁄2 in)     167 cm (5 ft 5 1⁄2 in)                          1.08                           20–85 (N= m:1534 f:1743)                                 93.6%  Self-reported  2008–2009      [9][26][89]                      179.70                Norway
34     54841552         Africa              South Africa    ZAF    739100.0  POLYGON ((16.34498 -28.57671, 16.82402 -28.082...          ZAF                                   South Africa          168 cm (5 ft 6 in)     159 cm (5 ft 2 1⁄2 in)                          1.06                                19 (N= m:121 f:118)                                  3.6%       Measured       2003            [110]                      168.00          South Africa
36    124574795  North America                    Mexico    MEX   2307000.0  POLYGON ((-117.12776 32.53534, -115.99135 32.6...          MEX                                         Mexico      172 cm (5 ft 7 1⁄2 in)     159 cm (5 ft 2 1⁄2 in)                          1.08                                              20–65                                 62.0%       Measured       2014             [83]                      172.00                Mexico
37      3360148  South America                   Uruguay    URY     73250.0  POLYGON ((-57.62513 -30.21629, -56.97603 -30.1...          URY                                        Uruguay          170 cm (5 ft 7 in)         158 cm (5 ft 2 in)                          1.08                        Adults (N= m:2,249 f:2,114)                                   NaN       Measured       1990            [135]                      170.00               Uruguay
38    207353391  South America                    Brazil    BRA   3081000.0  POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...          BRA                                         Brazil        170.7 cm (5 ft 7 in)   158.8 cm (5 ft 2 1⁄2 in)                          1.07                         18+ (N= m:62,037 f:65,696)                                100.0%       Measured       2009         [25][26]                      170.70                Brazil
39    207353391  South America                    Brazil    BRA   3081000.0  POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...          BRA                                 Brazil – Urban    173.5 cm (5 ft 8 1⁄2 in)   161.6 cm (5 ft 3 1⁄2 in)                          1.07                         20–24 (N= m:6,360 f:6,305)                                 10.9%       Measured       2009             [25]                      173.50                Brazil
40    207353391  South America                    Brazil    BRA   3081000.0  POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...          BRA                                 Brazil – Rural    170.9 cm (5 ft 7 1⁄2 in)   158.9 cm (5 ft 2 1⁄2 in)                          1.07                         20–24 (N= m:1,939 f:1,633)                                  2.1%       Measured       2009             [25]                      170.90                Brazil
42     11138234  South America                   Bolivia    BOL     78350.0  POLYGON ((-69.52968 -10.95173, -68.78616 -11.0...          BOL                                        Bolivia        160.0 cm (5 ft 3 in)       142.2 cm (4 ft 8 in)                          1.13                                      Aymara, 20–29                                   NaN       Measured       1970             [23]                      160.00               Bolivia
43     31036656  South America                      Peru    PER    410400.0  POLYGON ((-69.89364 -4.29819, -70.79477 -4.251...          PER                                           Peru      164 cm (5 ft 4 1⁄2 in)    151 cm (4 ft 11 1⁄2 in)                          1.09                                                20+                             0.011509%       Measured       2005             [90]                      164.00                  Peru
44     47698524  South America                  Colombia    COL    688000.0  POLYGON ((-66.87633 1.25336, -67.06505 1.13011...          COL                                       Colombia        170.6 cm (5 ft 7 in)   158.7 cm (5 ft 2 1⁄2 in)                          1.07                 18–22 (N= m:1,528,875 f:1,468,110)                                 14.1%       Measured       2002             [33]                      170.60              Colombia
56     67106161         Europe                    France    -99   2699000.0  MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3...          FRA                                         France        175.6 cm (5 ft 9 in)       162.5 cm (5 ft 4 in)                          1.08                              18–70 (N= m/f:11,562)                                 85.9%       Measured  2003–2004         [45][46]                      175.60                France
57     67106161         Europe                    France    -99   2699000.0  MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3...          FRA                                         France    174.1 cm (5 ft 8 1⁄2 in)   161.9 cm (5 ft 3 1⁄2 in)                          1.08                                                20+                                 96.6%       Measured       2001              [7]                      174.10                France
58     16290913  South America                   Ecuador    ECU    182400.0  POLYGON ((-75.37322 -0.15203, -75.23372 -0.911...          ECU                                        Ecuador        167.1 cm (5 ft 6 in)     154.2 cm (5 ft 1⁄2 in)                          1.08                                                NaN                                   NaN       Measured       2014             [40]                      167.10               Ecuador
60      2990561  North America                   Jamaica    JAM     25390.0  POLYGON ((-77.56960 18.49053, -76.89662 18.400...          JAM                                        Jamaica    171.8 cm (5 ft 7 1⁄2 in)   160.8 cm (5 ft 3 1⁄2 in)                          1.07                                              25–74                                 71.4%       Measured  1994–1996             [66]                      171.80               Jamaica
61     11147407  North America                      Cuba    CUB    132900.0  POLYGON ((-82.26815 23.18861, -81.40446 23.117...          CUB                                   Cuba – Urban          168 cm (5 ft 6 in)     156 cm (5 ft 1 1⁄2 in)                          1.08                                                15+                                 79.2%       Measured       1999             [35]                      168.00                  Cuba
66     17885245         Africa                      Mali    MLI     38090.0  POLYGON ((-11.51394 12.44299, -11.46790 12.754...          MLI                           Mali – Southern Mali    171.3 cm (5 ft 7 1⁄2 in)       160.4 cm (5 ft 3 in)                          1.07  Rural adults (N= m:121 f:320, SD= m:6.6 cm (2 ...                                   NaN       Measured       1992             [81]                      171.30                  Mali
70    190632261         Africa                   Nigeria    NGA   1089000.0  POLYGON ((2.69170 6.25882, 2.74906 7.87073, 2....          NGA                                        Nigeria    163.8 cm (5 ft 4 1⁄2 in)       157.8 cm (5 ft 2 in)                          1.04                                              18–74                                 98.6%       Measured  1994–1996             [66]                      163.80               Nigeria
71    190632261         Africa                   Nigeria    NGA   1089000.0  POLYGON ((2.69170 6.25882, 2.74906 7.87073, 2....          NGA                                        Nigeria        167.2 cm (5 ft 6 in)       160.3 cm (5 ft 3 in)                          1.04  20–29 (N= m:139 f:76, SD= m:6.5 cm (2 1⁄2 in) ...                                 33.2%       Measured       2011             [87]                      167.20               Nigeria
72     24994885         Africa                  Cameroon    CMR     77240.0  POLYGON ((14.49579 12.85940, 14.89336 12.21905...          CMR                               Cameroon – Urban        170.6 cm (5 ft 7 in)   161.3 cm (5 ft 3 1⁄2 in)                          1.06                           15+ (N= m:3,746 f:5,078)                                 53.6%       Measured       2003             [28]                      170.60              Cameroon
75     27499924         Africa                     Ghana    GHA    120800.0  POLYGON ((0.02380 11.01868, -0.04978 10.70692,...          GHA                                          Ghana    169.5 cm (5 ft 6 1⁄2 in)   158.5 cm (5 ft 2 1⁄2 in)                          1.07                                              25–29                                 14.7%       Measured  1987–1989             [49]                      169.50                 Ghana
87     19196246         Africa                    Malawi    MWI     21200.0  POLYGON ((32.75938 -9.23060, 33.73972 -9.41715...          MWI                                 Malawi – Urban      166 cm (5 ft 5 1⁄2 in)         155 cm (5 ft 1 in)                          1.07  16–60 (N= m:583 f:315, SD= m:6.0 cm (2 1⁄2 in)...                                101.1%       Measured       2000             [78]                      166.00                Malawi
92      8299706           Asia                    Israel    ISR    297000.0  POLYGON ((35.71992 32.70919, 35.54567 32.39399...          ISR                                         Israel      177 cm (5 ft 9 1⁄2 in)     166 cm (5 ft 5 1⁄2 in)                          1.07                                              18–21                                  9.7%       Measured       2010             [64]                      177.00                Israel
96      2051363         Africa                    Gambia    GMB      3387.0  POLYGON ((-16.71373 13.59496, -15.62460 13.623...          GMB                                 Gambia – Rural        168.0 cm (5 ft 6 in)       157.8 cm (5 ft 2 in)                          1.06  21–49 (N= m:9,559 f:13,160, SD= m:6.7 cm (2 1⁄...                                   NaN       Measured  1950–1974             [47]                      168.00                Gambia
100     6072475           Asia      United Arab Emirates    ARE    667200.0  POLYGON ((51.57952 24.24550, 51.75744 24.29407...          ARE                           United Arab Emirates    173.4 cm (5 ft 8 1⁄2 in)   156.4 cm (5 ft 1 1⁄2 in)                          1.11                                                NaN                                   NaN            NaN        NaN            [128]                      173.40  United Arab Emirates
101     2314307           Asia                     Qatar    QAT    334500.0  POLYGON ((50.81011 24.75474, 50.74391 25.48242...          QAT                                          Qatar        170.8 cm (5 ft 7 in)   161.1 cm (5 ft 3 1⁄2 in)                          1.06                                                 18                                  1.9%       Measured       2005         [21][96]                      170.80                 Qatar
103    39192111           Asia                      Iraq    IRQ    596700.0  POLYGON ((39.19547 32.16101, 38.79234 33.37869...          IRQ                                 Iraq – Baghdad        165.4 cm (5 ft 5 in)   155.8 cm (5 ft 1 1⁄2 in)                          1.06  18–44 (N= m:700 f:800, SD= m:5.6 cm (2 in) f:1...                                 76.3%       Measured  1999–2000             [61]                      165.40                  Iraq
107    68414135           Asia                  Thailand    THA   1161000.0  POLYGON ((105.21878 14.27321, 104.28142 14.416...          THA                                       Thailand        170.3 cm (5 ft 7 in)     159 cm (5 ft 2 1⁄2 in)                          1.07  STOU students, 15–19 (N= m:839 f:1,636, SD= m:...                             0.2%[122]  Self-reported       2005            [123]                      170.30              Thailand
110    96160163           Asia                   Vietnam    VNM    594900.0  POLYGON ((104.33433 10.48654, 105.19991 10.889...          VNM                                        Vietnam        162.1 cm (5 ft 4 in)       152.2 cm (5 ft 0 in)                          1.07      25–29 (SD= m:5.39 cm (2 in) f:5.39 cm (2 in))                                 15.9%       Measured  1992–1993             [49]                      162.10               Vietnam
111    96160163           Asia                   Vietnam    VNM    594900.0  POLYGON ((104.33433 10.48654, 105.19991 10.889...          VNM                                        Vietnam        165.7 cm (5 ft 5 in)       155.2 cm (5 ft 1 in)                          1.07  Students, 20–25 (N= m:1,000 f:1,000, SD= m:6.5...                             2.0%[136]       Measured  2006–2007            [137]                      165.70               Vietnam
112    25248140           Asia               North Korea    PRK     40000.0  MULTIPOLYGON (((130.78000 42.22001, 130.78000 ...  North Korea                                    North Korea        165.6 cm (5 ft 5 in)       154.9 cm (5 ft 1 in)                          1.07                    Defectors, 20–39 (N= m/f:1,075)                                 46.4%       Measured       2005             [70]                      165.60           North Korea
113    51181299           Asia               South Korea    KOR   1929000.0  POLYGON ((126.17476 37.74969, 126.23734 37.840...  South Korea                                    South Korea        170.7 cm (5 ft 7 in)       157.4 cm (5 ft 2 in)                          1.08  20+ (N= m:2,750 f:2,445, Median= m:170.7 cm (5...                                 96.5%       Measured       2010             [71]                      170.70           South Korea
114    51181299           Asia               South Korea    KOR   1929000.0  POLYGON ((126.17476 37.74969, 126.23734 37.840...  South Korea                                    South Korea    173.5 cm (5 ft 8 1⁄2 in)                        NaN                           NaN                   Conscripts, 18–19 (N= m:323,800)                                  3.8%       Measured       2017             [72]                      173.50           South Korea
116     3068243           Asia                  Mongolia    MNG     37000.0  POLYGON ((87.75126 49.29720, 88.80557 49.47052...          MNG                                       Mongolia    168.4 cm (5 ft 6 1⁄2 in)       157.7 cm (5 ft 2 in)                          1.07                             25–34 (N= m:158 f:181)                                 27.6%       Measured       2006             [84]                      168.40              Mongolia
117  1281935911           Asia                     India    IND   8721000.0  POLYGON ((97.32711 28.26158, 97.40256 27.88254...          IND                                  India – Urban    174.3 cm (5 ft 8 1⁄2 in)   158.5 cm (5 ft 2 1⁄2 in)                          1.10  Private school students, 18 (N= m:34,411 f:30,...                                   NaN       Measured       2011             [55]                      174.30                 India
118  1281935911           Asia                     India    IND   8721000.0  POLYGON ((97.32711 28.26158, 97.40256 27.88254...          IND                                  India – Rural    161.5 cm (5 ft 3 1⁄2 in)       152.5 cm (5 ft 0 in)                          1.06       17 (SD= m:7.0 cm (3 in) f:6.3 cm (2 1⁄2 in))                                   NaN       Measured       2002             [56]                      161.50                 India
119  1281935911           Asia                     India    IND   8721000.0  POLYGON ((97.32711 28.26158, 97.40256 27.88254...          IND                                          India        164.7 cm (5 ft 5 in)       152.6 cm (5 ft 0 in)                          1.08                      20–49 (N= m:69,245 f:118,796)                                 44.3%       Measured  2005-2006             [57]                      164.70                 India
120  1281935911           Asia                     India    IND   8721000.0  POLYGON ((97.32711 28.26158, 97.40256 27.88254...          IND                        India – Patiala, Punjab       177.3 cm (5 ft 10 in)                        NaN                           NaN  Students, Punjabi, 18-25 (N: 149, SD = 7.88 cm...                                 22.4%       Measured       2013             [58]                      177.30                 India
123    29384297           Asia                     Nepal    NPL     71520.0  POLYGON ((88.12044 27.87654, 88.04313 27.44582...          NPL                                          Nepal        163.0 cm (5 ft 4 in)  150.8 cm (4 ft 11 1⁄2 in)                           NaN            25–49 (N= f:6,280, SD= f:5.5 cm (2 in))                                 52.9%  Self-reported       2006             [15]                      163.00                 Nepal
129    82021564           Asia                      Iran    IRN   1459000.0  POLYGON ((48.56797 29.92678, 48.01457 30.45246...         Iran                                           Iran        170.3 cm (5 ft 7 in)       157.2 cm (5 ft 2 in)                          1.08  21+ (N= m/f:89,532, SD= m:8.05 cm (3 in) f:7.2...                                 88.1%       Measured       2005             [60]                      170.30                  Iran
132     9960487         Europe                    Sweden    SWE    498100.0  POLYGON ((11.02737 58.85615, 11.46827 59.43239...          SWE                                         Sweden   181.5 cm (5 ft 11 1⁄2 in)   166.8 cm (5 ft 5 1⁄2 in)                          1.09                                              20–29                                 15.6%       Measured       2008            [116]                      181.50                Sweden
133     9960487         Europe                    Sweden    SWE    498100.0  POLYGON ((11.02737 58.85615, 11.46827 59.43239...          SWE                                         Sweden       177.9 cm (5 ft 10 in)       164.6 cm (5 ft 5 in)                          1.08                                              20–74                                 86.3%  Self-reported  1987–1994            [117]                      177.90                Sweden
136    38476269         Europe                    Poland    POL   1052000.0  POLYGON ((23.48413 53.91250, 23.52754 53.47012...          POL                                         Poland        172.2 cm (5 ft 8 in)       159.4 cm (5 ft 3 in)                          1.07                          44-69 (N= m:4336 f: 4559)                                 39.4%       Measured       2007             [93]                      172.20                Poland
137    38476269         Europe                    Poland    POL   1052000.0  POLYGON ((23.48413 53.91250, 23.52754 53.47012...          POL                                         Poland   178.7 cm (5 ft 10 1⁄2 in)       165.1 cm (5 ft 5 in)                          1.08                              18 (N= m:846 f:1,126)                                  1.6%       Measured       2010             [94]                      178.70                Poland
138     8754413         Europe                   Austria    AUT    416600.0  POLYGON ((16.97967 48.12350, 16.90375 47.71487...          AUT                                        Austria     179 cm (5 ft 10 1⁄2 in)     166 cm (5 ft 5 1⁄2 in)                          1.08                                              20–49                                 54.3%       Measured       2006             [17]                      179.00               Austria
139     9850845         Europe                   Hungary    HUN    267600.0  POLYGON ((22.08561 48.42226, 22.64082 48.15024...          HUN                                        Hungary      176 cm (5 ft 9 1⁄2 in)     164 cm (5 ft 4 1⁄2 in)                          1.07                                             Adults                                   NaN       Measured      2000s             [53]                      176.00               Hungary
140     9850845         Europe                   Hungary    HUN    267600.0  POLYGON ((22.08561 48.42226, 22.64082 48.15024...          HUN                                        Hungary       177.3 cm (5 ft 10 in)                        NaN                           NaN          18 (N= m:1,080, SD= m:5.99 cm (2 1⁄2 in))                                  1.7%       Measured       2005             [54]                      177.30               Hungary
142    21529967         Europe                   Romania    ROU    441000.0  POLYGON ((28.23355 45.48828, 28.67978 45.30403...          ROU                                        Romania      172 cm (5 ft 7 1⁄2 in)         157 cm (5 ft 2 in)                          1.10                                                NaN                                   NaN       Measured       2007             [97]                      172.00               Romania
143     2823859         Europe                 Lithuania    LTU     85620.0  POLYGON ((26.49433 55.61511, 26.58828 55.16718...          LTU                              Lithuania – Urban       178.4 cm (5 ft 10 in)                        NaN                           NaN  Conscripts, 19–25 (N= m:91 SD= m:6.7 cm (2 1⁄2...                                  9.9%       Measured   2005[75]             [76]                      178.40             Lithuania
144     2823859         Europe                 Lithuania    LTU     85620.0  POLYGON ((26.49433 55.61511, 26.58828 55.16718...          LTU                              Lithuania – Rural    176.2 cm (5 ft 9 1⁄2 in)                        NaN                           NaN  Conscripts, 19–25 (N= m:106 SD= m:5.9 cm (2 1⁄...                                  4.9%       Measured   2005[75]             [76]                      176.20             Lithuania
145     2823859         Europe                 Lithuania    LTU     85620.0  POLYGON ((26.49433 55.61511, 26.58828 55.16718...          LTU                                      Lithuania   181.3 cm (5 ft 11 1⁄2 in)       167.5 cm (5 ft 6 in)                          1.08                                                 18                                  2.1%       Measured       2001             [77]                      181.30             Lithuania
147     1251581         Europe                   Estonia    EST     38700.0  POLYGON ((27.98113 59.47537, 27.98112 59.47537...          EST                                        Estonia   179.1 cm (5 ft 10 1⁄2 in)                        NaN                           NaN                                                 17                                  2.3%       Measured       2003             [42]                      179.10               Estonia
148    80594017         Europe                   Germany    DEU   3979000.0  POLYGON ((14.11969 53.75703, 14.35332 53.24817...          DEU                                        Germany        175.4 cm (5 ft 9 in)       162.8 cm (5 ft 4 in)                          1.08                              18–79 (N= m/f:19,768)                                 94.3%       Measured       2007              [6]                      175.40               Germany
149    80594017         Europe                   Germany    DEU   3979000.0  POLYGON ((14.11969 53.75703, 14.35332 53.24817...          DEU                                        Germany         178 cm (5 ft 10 in)         165 cm (5 ft 5 in)                          1.08                         18+ (N= m:25,112 f:25,560)                                100.0%  Self-reported       2009             [48]                      178.00               Germany
150     7101510         Europe                  Bulgaria    BGR    143100.0  POLYGON ((22.65715 44.23492, 22.94483 43.82379...          BGR                                       Bulgaria        175.2 cm (5 ft 9 in)   163.2 cm (5 ft 4 1⁄2 in)                          1.07                                                NaN                                   NaN            NaN       2010             [27]                      175.20              Bulgaria
151    10768477         Europe                    Greece    GRC    290500.0  MULTIPOLYGON (((26.29000 35.29999, 26.16500 35...          GRC                                         Greece      177 cm (5 ft 9 1⁄2 in)         165 cm (5 ft 5 in)                          1.07                                              18–49                                 56.3%       Measured       2003             [17]                      177.00                Greece
152    80845215           Asia                    Turkey    TUR   1670000.0  MULTIPOLYGON (((44.77268 37.17044, 44.29345 37...          TUR                                         Turkey    173.6 cm (5 ft 8 1⁄2 in)   161.9 cm (5 ft 3 1⁄2 in)                          1.07                             20-22 (N= m:322 f:247)                                  8.3%       Measured       2007    [11][21][125]                      173.60                Turkey
153    80845215           Asia                    Turkey    TUR   1670000.0  MULTIPOLYGON (((44.77268 37.17044, 44.29345 37...          TUR                                Turkey – Ankara    174.1 cm (5 ft 8 1⁄2 in)   158.9 cm (5 ft 2 1⁄2 in)                          1.10  18–59 (N= m:703 f:512, Median= m:169.7 cm (5 f...                             5.1%[126]       Measured  2004–2006            [127]                      174.10                Turkey
155     3047987         Europe                   Albania    ALB     33900.0  POLYGON ((21.02004 40.84273, 20.99999 40.58000...          ALB                                        Albania    174.0 cm (5 ft 8 1⁄2 in)   161.8 cm (5 ft 3 1⁄2 in)                          1.08                           20–29 (N= m:649 f:1,806)                                 23.5%       Measured  2008–2009         [11][12]                      174.00               Albania
156     4292095         Europe                   Croatia    HRV     94240.0  POLYGON ((16.56481 46.50375, 16.88252 46.38063...          HRV                                        Croatia       180.4 cm (5 ft 11 in)  166.49 cm (5 ft 5 1⁄2 in)                          1.09  18 (N= m:358 f:360, SD= m:6.8 cm (2 1⁄2 in) f:...                                  1.6%       Measured  2006–2008             [34]                      180.40               Croatia
157     8236303         Europe               Switzerland    CHE    496300.0  POLYGON ((9.59423 47.52506, 9.63293 47.34760, ...          CHE                                    Switzerland       178.2 cm (5 ft 10 in)                        NaN                           NaN  Conscripts, 19 (N= m:12,447, Median= m:178.0 c...                                  1.5%       Measured       2009            [118]                      178.20           Switzerland
158     8236303         Europe               Switzerland    CHE    496300.0  POLYGON ((9.59423 47.52506, 9.63293 47.34760, ...          CHE                                    Switzerland        175.4 cm (5 ft 9 in)     164 cm (5 ft 4 1⁄2 in)                          1.07                                              20–74                                 88.8%  Self-reported  1987–1994            [117]                      175.40           Switzerland
160    11491346         Europe                   Belgium    BEL    508600.0  POLYGON ((6.15666 50.80372, 6.04307 50.12805, ...          BEL                                        Belgium   178.6 cm (5 ft 10 1⁄2 in)       168.1 cm (5 ft 6 in)                          1.06  21 (N= m:20–49 f:20–49, SD= m:6.6 cm (2 1⁄2 in...                                  1.7%  Self-reported       2001             [22]                      178.60               Belgium
161    17084719         Europe               Netherlands    NLD    870800.0  POLYGON ((6.90514 53.48216, 7.09205 53.14404, ...          NLD                                    Netherlands       180.8 cm (5 ft 11 in)       167.5 cm (5 ft 6 in)                          1.08                                                20+                                 96.8%  Self-reported       2013      [9][26][86]                      180.80           Netherlands
162    10839514         Europe                  Portugal    PRT    297100.0  POLYGON ((-9.03482 41.88057, -8.67195 42.13469...          PRT                                       Portugal    173.9 cm (5 ft 8 1⁄2 in)                        NaN                           NaN                                      18 (N= m:696)                                  1.5%       Measured       2008         [11][95]                      173.90              Portugal
163    10839514         Europe                  Portugal    PRT    297100.0  POLYGON ((-9.03482 41.88057, -8.67195 42.13469...          PRT                                       Portugal      171 cm (5 ft 7 1⁄2 in)     161 cm (5 ft 3 1⁄2 in)                          1.06                                              20–50                                 56.7%  Self-reported       2001             [17]                      171.00              Portugal
164    10839514         Europe                  Portugal    PRT    297100.0  POLYGON ((-9.03482 41.88057, -8.67195 42.13469...          PRT                                       Portugal    173.7 cm (5 ft 8 1⁄2 in)   163.7 cm (5 ft 4 1⁄2 in)                          1.06  21 (N= m:87 f:106, SD= m:8.2 cm (3 in) f:5.3 c...                                  1.9%  Self-reported       2001             [22]                      173.70              Portugal
165    48958159         Europe                     Spain    ESP   1690000.0  POLYGON ((-7.45373 37.09779, -7.53711 37.42890...          ESP                                          Spain        173.1 cm (5 ft 8 in)                        NaN                           NaN                       18–70 (N= m:1,298 [s][112] )                                 88.2%       Measured  2013–2014       [113][114]                      173.10                 Spain
167    48958159         Europe                     Spain    ESP   1690000.0  POLYGON ((-7.45373 37.09779, -7.53711 37.42890...          ESP                                          Spain      174 cm (5 ft 8 1⁄2 in)         163 cm (5 ft 4 in)                          1.07                                              20–49                                 57.0%  Self-reported       2007             [17]                      174.00                 Spain
168     5011102         Europe                   Ireland    IRL    322000.0  POLYGON ((-6.19788 53.86757, -6.03299 53.15316...          IRL                                        Ireland      177 cm (5 ft 9 1⁄2 in)         163 cm (5 ft 4 in)                          1.09                                              20–49                                 61.8%       Measured       2007             [17]                      177.00               Ireland
169     5011102         Europe                   Ireland    IRL    322000.0  POLYGON ((-6.19788 53.86757, -6.03299 53.15316...          IRL                                        Ireland     179 cm (5 ft 10 1⁄2 in)         165 cm (5 ft 5 in)                          1.08                                                 18                                     -       Measured       2014         [62][63]                      179.00               Ireland
172     4510327        Oceania               New Zealand    NZL    174800.0  MULTIPOLYGON (((176.88582 -40.06598, 176.50802...          NZL                                    New Zealand      177 cm (5 ft 9 1⁄2 in)     164 cm (5 ft 4 1⁄2 in)                          1.08                                              20–49                                 56.9%       Measured       2007             [17]                      177.00           New Zealand
173    23232413        Oceania                 Australia    AUS   1189000.0  MULTIPOLYGON (((147.68926 -40.80826, 148.28907...          AUS                                      Australia        175.6 cm (5 ft 9 in)   161.8 cm (5 ft 3 1⁄2 in)                          1.09                                                18+                                100.0%       Measured  2011–2012             [16]                      175.60             Australia
174    22409381           Asia                 Sri Lanka    LKA    236700.0  POLYGON ((81.78796 7.52306, 81.63732 6.48178, ...          LKA                                      Sri Lanka    163.6 cm (5 ft 4 1⁄2 in)  151.4 cm (4 ft 11 1⁄2 in)                          1.08  18+ (N= m:1,768 f:2,709, SD= m:6.9 cm (2 1⁄2 i...                                100.0%       Measured  2005–2006            [111]                      163.60             Sri Lanka
175  1379302771           Asia                     China    CHN  21140000.0  MULTIPOLYGON (((109.47521 18.19770, 108.65521 ...          CHN                                          China    169.5 cm (5 ft 6 1⁄2 in)       158.0 cm (5 ft 2 in)                          1.07                                  18-69 (N=172,422)                                 76.8%       Measured       2014             [31]                      169.50                 China
176  1379302771           Asia                     China    CHN  21140000.0  MULTIPOLYGON (((109.47521 18.19770, 108.65521 ...          CHN                        China – Beijing – Urban        175.2 cm (5 ft 9 in)       162.6 cm (5 ft 4 in)                          1.08                         Urban, 18 (N= m:448 f:405)                                  0.5%       Measured       2011             [32]                      175.20                 China
177    23508428           Asia                    Taiwan    TWN   1127000.0  POLYGON ((121.77782 24.39427, 121.17563 22.790...          TWN                                         Taiwan    171.4 cm (5 ft 7 1⁄2 in)       159.9 cm (5 ft 3 in)                          1.07                                17 (N= m:200 f:200)                                  1.7%       Measured       2011  [119][120][121]                      171.40                Taiwan
178    62137802         Europe                     Italy    ITA   2221000.0  MULTIPOLYGON (((10.44270 46.89355, 11.04856 46...          ITA                                          Italy    176.5 cm (5 ft 9 1⁄2 in)       162.5 cm (5 ft 4 in)                          1.09                                                 18                                  1.4%       Measured  1999–2004     [11][21][65]                      176.50                 Italy
179    62137802         Europe                     Italy    ITA   2221000.0  MULTIPOLYGON (((10.44270 46.89355, 11.04856 46...          ITA                                          Italy       177.2 cm (5 ft 10 in)       167.8 cm (5 ft 6 in)                          1.06  21 (N= m:106 f:92, SD= m:6.0 cm (2 1⁄2 in) f:6...                                  1.4%  Self-reported       2001             [22]                      177.20                 Italy
180     5605948         Europe                   Denmark    DNK    264800.0  MULTIPOLYGON (((9.92191 54.98310, 9.28205 54.8...          DNK                                        Denmark       180.4 cm (5 ft 11 in)       167.2 cm (5 ft 6 in)                           NaN                    Conscripts, 18–20 (N= m:38,025)                                  5.3%       Measured       2012             [37]                      180.40               Denmark
181    64769452         Europe            United Kingdom    GBR   2788000.0  MULTIPOLYGON (((-6.19788 53.86757, -6.95373 54...          GBR                       United Kingdom – England        175.3 cm (5 ft 9 in)   161.9 cm (5 ft 3 1⁄2 in)                          1.08                           16+ (N= m:3,154 f:3,956)                           103.2%[129]       Measured       2012              [5]                      175.30        United Kingdom
182    64769452         Europe            United Kingdom    GBR   2788000.0  MULTIPOLYGON (((-6.19788 53.86757, -6.95373 54...          GBR                      United Kingdom – Scotland        175.0 cm (5 ft 9 in)   161.3 cm (5 ft 3 1⁄2 in)                          1.08  16+ (N= m:2,512 f:3,180, Median= m:174.8 cm (5...                           103.0%[129]       Measured       2008            [130]                      175.00        United Kingdom
183    64769452         Europe            United Kingdom    GBR   2788000.0  MULTIPOLYGON (((-6.19788 53.86757, -6.95373 54...          GBR                         United Kingdom – Wales    177.0 cm (5 ft 9 1⁄2 in)       162.0 cm (5 ft 4 in)                          1.09                                                16+                           103.2%[129]  Self-reported       2009            [131]                      177.00        United Kingdom
184      339747         Europe                   Iceland    ISL     16150.0  POLYGON ((-14.50870 66.45589, -14.73964 65.808...          ISL                                        Iceland     181 cm (5 ft 11 1⁄2 in)         168 cm (5 ft 6 in)                          1.08                                              20–49                                 43.6%  Self-reported       2007             [17]                      181.00               Iceland
185     9961396           Asia                Azerbaijan    AZE    167900.0  MULTIPOLYGON (((46.40495 41.86068, 46.68607 41...          AZE                                     Azerbaijan    171.8 cm (5 ft 7 1⁄2 in)       165.4 cm (5 ft 5 in)                          1.04                                                16+                                106.5%       Measured       2005             [18]                      171.80            Azerbaijan
187   104256076           Asia               Philippines    PHL    801900.0  MULTIPOLYGON (((120.83390 12.70450, 120.32344 ...          PHL                                    Philippines    163.5 cm (5 ft 4 1⁄2 in)       151.8 cm (5 ft 0 in)                          1.08                                              20–39                             31.5%[91]       Measured       2003             [92]                      163.50           Philippines
188    31381992           Asia                  Malaysia    MYS    863000.0  MULTIPOLYGON (((100.08576 6.46449, 100.25960 6...          MYS                                       Malaysia    166.3 cm (5 ft 5 1⁄2 in)       154.7 cm (5 ft 1 in)                          1.07  Malay, 20–24 (N= m:749 f:893, Median= m:166 cm...                              9.7%[79]       Measured       1996             [80]                      166.30              Malaysia
189    31381992           Asia                  Malaysia    MYS    863000.0  MULTIPOLYGON (((100.08576 6.46449, 100.25960 6...          MYS                                       Malaysia    168.5 cm (5 ft 6 1⁄2 in)       158.1 cm (5 ft 2 in)                          1.07  Chinese, 20–24 (N= m:407 f:453, Median= m:169 ...                              4.1%[79]       Measured       1996             [80]                      168.50              Malaysia
190    31381992           Asia                  Malaysia    MYS    863000.0  MULTIPOLYGON (((100.08576 6.46449, 100.25960 6...          MYS                                       Malaysia    169.1 cm (5 ft 6 1⁄2 in)       155.4 cm (5 ft 1 in)                          1.09  Indian, 20–24 (N= m:113 f:140, Median= m:168 c...                              1.2%[79]       Measured       1996             [80]                      169.10              Malaysia
191    31381992           Asia                  Malaysia    MYS    863000.0  MULTIPOLYGON (((100.08576 6.46449, 100.25960 6...          MYS                                       Malaysia    163.3 cm (5 ft 4 1⁄2 in)       151.9 cm (5 ft 0 in)                          1.08  Other indigenous, 20–24 (N= m:257 f:380, Media...                              0.4%[79]       Measured       1996             [80]                      163.30              Malaysia
193     1972126         Europe                  Slovenia    SVN     68350.0  POLYGON ((13.80648 46.50931, 14.63247 46.43182...          SVN                           Slovenia – Ljubljana       180.3 cm (5 ft 11 in)       167.4 cm (5 ft 6 in)                          1.08                                                 19                             0.2%[108]       Measured       2011            [109]                      180.30              Slovenia
194     5491218         Europe                   Finland    FIN    224137.0  POLYGON ((28.59193 69.06478, 28.44594 68.36461...          FIN                                        Finland   178.9 cm (5 ft 10 1⁄2 in)       165.3 cm (5 ft 5 in)                          1.08                               25–34 (N= m/f:2,305)                                 19.0%       Measured       1994             [43]                      178.90               Finland
195     5491218         Europe                   Finland    FIN    224137.0  POLYGON ((28.59193 69.06478, 28.44594 68.36461...          FIN                                        Finland       180.7 cm (5 ft 11 in)       167.2 cm (5 ft 6 in)                          1.08                                −25 (N= m/f:26,636)                                  9.2%       Measured  2010–2011         [43][44]                      180.70               Finland
196     5445829         Europe                  Slovakia    SVK    168800.0  POLYGON ((22.55814 49.08574, 22.28084 48.82539...          SVK                                       Slovakia   179.4 cm (5 ft 10 1⁄2 in)       165.6 cm (5 ft 5 in)                          1.08                                                 18                                  2.0%       Measured       2004            [107]                      179.40              Slovakia
197    10674723         Europe                   Czechia    CZE    350900.0  POLYGON ((15.01700 51.10667, 15.49097 50.78473...          CZE                                 Czech Republic       180.3 cm (5 ft 11 in)      167.22 cm (5 ft 6 in)                          1.08                                                 17                                  1.6%       Measured       2001             [36]                      180.30        Czech Republic
199   126451398           Asia                     Japan    JPN   4932000.0  MULTIPOLYGON (((141.88460 39.18086, 140.95949 ...          JPN                                          Japan      172 cm (5 ft 7 1⁄2 in)         158 cm (5 ft 2 in)                          1.08                                              20–49                                 47.2%       Measured       2005             [17]                      172.00                 Japan
200   126451398           Asia                     Japan    JPN   4932000.0  MULTIPOLYGON (((141.88460 39.18086, 140.95949 ...          JPN                                          Japan    172.0 cm (5 ft 7 1⁄2 in)  158.70 cm (5 ft 2 1⁄2 in)                          1.08  20–24 (N= m:1,708 f:1,559, SD= m:5.42 cm (2 in...                                  7.2%       Measured       2004             [67]                      172.00                 Japan
201   126451398           Asia                     Japan    JPN   4932000.0  MULTIPOLYGON (((141.88460 39.18086, 140.95949 ...          JPN                                          Japan        170.7 cm (5 ft 7 in)       158.0 cm (5 ft 2 in)                          1.08                                                 17                                  1.2%       Measured       2013             [68]                      170.70                 Japan
204    28571770           Asia              Saudi Arabia    SAU   1731000.0  POLYGON ((34.95604 29.35655, 36.06894 29.19749...          SAU                                   Saudi Arabia    168.9 cm (5 ft 6 1⁄2 in)   156.3 cm (5 ft 1 1⁄2 in)                          1.08                                                 18                                  3.0%       Measured       2010        [21][100]                      168.90          Saudi Arabia
205    28571770           Asia              Saudi Arabia    SAU   1731000.0  POLYGON ((34.95604 29.35655, 36.06894 29.19749...          SAU                                   Saudi Arabia      174 cm (5 ft 8 1⁄2 in)                        NaN                           NaN                                                NaN                                   NaN            NaN       2017            [101]                      174.00          Saudi Arabia
210    97041072         Africa                     Egypt    EGY   1105000.0  POLYGON ((36.86623 22.00000, 32.90000 22.00000...          EGY                                          Egypt        170.3 cm (5 ft 7 in)   158.9 cm (5 ft 2 1⁄2 in)                          1.07                           20–24 (N= m:845 f:1,059)                                 16.6%       Measured       2008             [41]                      170.30                 Egypt
220     7111024         Europe                    Serbia    SRB    101800.0  POLYGON ((18.82982 45.90887, 18.82984 45.90888...          SRB                                         Serbia   182.0 cm (5 ft 11 1⁄2 in)   166.8 cm (5 ft 5 1⁄2 in)                          1.09  Students at UNS,18–30 (N= m:318 f:76, SD= m:6....                             0.7%[102]       Measured       2012            [103]                      182.00                Serbia
221      642550         Europe                Montenegro    MNE     10610.0  POLYGON ((20.07070 42.58863, 19.80161 42.50009...          MNE                                     Montenegro        183.4 cm (6 ft 0 in)   169.4 cm (5 ft 6 1⁄2 in)                          1.09  17-20 (N= m:981 f:1107, SD= m:6.89 cm (2 1⁄2 i...                                  5.2%       Measured       2017             [85]                      183.40            Montenegro
222     1895250         Europe                    Kosovo    -99     18490.0  POLYGON ((20.59025 41.85541, 20.52295 42.21787...       Kosovo                             Kosovo – Prishtina  179.52 cm (5 ft 10 1⁄2 in)      165.72 cm (5 ft 5 in)                           NaN  Conscripts, 18-20 (N= m:830 f:793, SD= m:7.02 ...                                 63.0%       Measured       2017             [74]                      179.52                Kosovo

Also, notice that we also removed the rows, which has no Average male height.

Step 5: Create the map with our data

Now we have done all the hard work.

It is time to use folium to do the last piece of work.

Let’s put it all together.

import pandas as pd
import numpy as np
import folium
import geopandas
import pycountry


# Helper function to map country names to alpha_3 representation - though some are not known by library
def lookup_country_code(country):
    try:
        return pycountry.countries.lookup(country).alpha_3
    except LookupError:
        return country


# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Average_human_height_by_country'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the first table
table = tables[0]

# To avoid writing it all the time
AVG_MH = 'Average male height'
CR = 'Country/Region'
COUNTRY = 'Country'
AMH_F = 'Aveage male height (float)'
A3 = 'alpha3'

# Remove duplicate rows with 'Average male height'
table = table.loc[table[AVG_MH] != AVG_MH].copy()

# Clean up data to have height in cm
table[AMH_F] = table.apply(lambda row: float(row[AVG_MH].split(' ')[0]) if row[AVG_MH] is not np.nan else np.nan,
                           axis=1)

# Clean up the names if used a dash before
table[COUNTRY] = table.apply(
    lambda row: row[CR].split(' – ')[0] if ' – ' in row[CR] else row[CR],
    axis=1)
# Map the country name to the alpha3 representation
table[A3] = table.apply(lambda row: lookup_country_code(row[COUNTRY]), axis=1)

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Do the same mapping to alpha3
world[A3] = world.apply(lambda row: lookup_country_code(row['name']), axis=1)

# Merge the data
table = world.merge(table, how="left", left_on=[A3], right_on=[A3])

# Remove countries with no data
table = table.dropna(subset=[AMH_F])

# Creating a map
my_map = folium.Map()

# Adding the data from our table
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=[A3, AMH_F],
    key_on='feature.properties.alpha3',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Male height'
).add_to(my_map)
# Save the map to an html file
my_map.save('height_map.html')

Which should result in a map like this you can use in your browser. Zoom in and out.

The result.

This is nice. Good job.

Read HTML Tables with Pandas – Count and Sum Values in Groups

What will this tutorial cover

  • Pandas has a built-in library function to read html tables directly. The library call depends on other libraries, which needs to be installed first.
  • Often data needs to be cleaned and transformed into the right format to proceed with processing.
  • Finally, we will group the data and count occurrences and sum up a value attribute.

We will cover all that with real life example by using an example of a table from a wikipedia page.

Step 1: What you need to get started

First of, you need to install the pandas library, which can be done by using pip.

pip install pandas

Or see here.

Using the read_html call (with will read the URL argument, parse the data from all tables and return them) from the library need installation of further supporting libraries. If you run this example.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

print(tables)

You will get the following error.

Traceback (most recent call last):
  File "/Users/admin/PycharmProjects/MyLearningSpace/Pandas_read_html.py", line 5, in <module>
    table = pd.read_html(url)
  File "/Users/admin/PycharmProjects/MyLearningSpace/venv/lib/python3.8/site-packages/pandas/io/html.py", line 1085, in read_html
    return _parse(
  File "/Users/admin/PycharmProjects/MyLearningSpace/venv/lib/python3.8/site-packages/pandas/io/html.py", line 891, in _parse
    parser = _parser_dispatch(flav)
  File "/Users/admin/PycharmProjects/MyLearningSpace/venv/lib/python3.8/site-packages/pandas/io/html.py", line 848, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

Which states you also need to install the lxml library to use the call. You can install that by.

pip install lxml.

The lxml library is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

Then you should get the expected result.

The read_html call returns a list of DataFrames.

Hence you can see how many by the following code.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

print(len(tables))

Which tells you it contains 9 (at the time of writing – changes happens to wikipedia pages).

In this tutorial we are interested in the main table on the page, which you can find in the first entry (at the time of writing).

Hence, you can access the DataFrame through this code.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

table = tables[0]

Step 2: Clean data to prepare it for sum and count

The next thing you want to do is to use the values in the table. First thing you notice is, that you both need to clean and cast values.

Let’s say you want to organize the data into countries. This can be done by help of Lambda function. Also see this tutorial if you are new to Lambda functions.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

table = tables[0]

table['Country'] = table.apply(lambda row: row['Location'].split(', ')[-1], axis=1)

Where the magic last line will add a new column to the data with the country in it.

               Deaths Injuries  ...                       Location       Country
0  200-400 (estimate)        1  ...  New York City, New York, U.S.          U.S.
1                 104        0  ...           Orkney, South Africa  South Africa
2                  52        0  ...           Welkom, South Africa  South Africa
3                  19        0  ...                   Wuhan, China         China
4                  19        1  ...                 Leigh, England       England

As you see, it uses the split on Location and only adds the last value, the country itself.

Next up, you see that if we want to sum up the deaths by country we need to make a decisions, as some are in intervals. For this purpose, we are conservative and take the lower bound of it.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

table = tables[0]

table['Country'] = table.apply(lambda row: row['Location'].split(', ')[-1], axis=1)
table['Deaths (clean)'] = table.apply(lambda row: int(row['Deaths']) if row['Deaths'].isdigit() else int(row['Deaths'].split('-')[0]), axis=1)
print(table.head())

Which results in the following.

               Deaths Injuries  ...       Country Deaths (clean)
0  200-400 (estimate)        1  ...          U.S.           200
1                 104        0  ...  South Africa           104
2                  52        0  ...  South Africa            52
3                  19        0  ...         China            19
4                  19        1  ...       England            19

Also notice, that the Deaths (clean) column is converted to an int by the lambda function.

Step 3: Count and sum by country

Now we need to group it by Country first to get the sum of deaths and then count of how many accidents were there.

import pandas as pd


url = 'https://en.wikipedia.org/wiki/List_of_elevator_accidents'
tables = pd.read_html(url)

table = tables[0]

table['Country'] = table.apply(lambda row: row['Location'].split(', ')[-1], axis=1)
table['Death (clean)'] = table.apply(lambda row: int(row['Deaths']) if row['Deaths'].isdigit() else int(row['Deaths'].split('-')[0]), axis=1)

country = table.groupby('Country').count()[['Location']].sort_values(by='Location', ascending=False)
country = country.rename(columns={'Location': 'Events'})
country['Total Deaths'] = table.groupby('Country')['Death (clean)'].sum()

Which will result in the following.

                         Events  Total Deaths
Country                                      
U.S.                         13           239
China                         5            55
England                       3            41
Spain                         2             9
South Korea                   2             8
South Africa                  2           156
Brazil                        2            13
Australia                     1             4
Netherlands                   1             2
Turkey                        1            10
Pakistan                      1             6
Iran                          1             6
Mogok Township. Myanmar       1             6
India                         1             6
Hong Kong                     1            12
Germany                       1            10
Canada                        1             4
Cambodia                      1             4
Ukraine                       1            11

That is it.