How to Create Choropleth Maps with Evenly Distributed Colors in 3 Easy Steps

What will we cover in the tutorial

  • You create your default map using Choropleth from folium (or any other) and the color distribution is poor. Most countries are distributed between two colors, which makes the map less informative.
Example of poor color distribution.
  • Understand the issue
  • How to solve it
  • …and putting it all together.

Step 1: Understand the issue

In this example we have used the divorce rates in reported in various countries on wikipedia’s page of Divorce Demography

At first inspection on the wikipedia page you get an idea of what the problem is.

From wikipedia.org

The divorce rate in Denmark is too high (just kidding, I am from Denmark and not proud that Denmark is ranking number 6 based on percentage of marriages that end in divorce).

The issue is, that the distribution is not even. See the highest is Tunisia with 97.14 percent, then Portugal is second with 70.97 percent. This has to be compared to the color coding. The last two colors are by default distributed to 67 to 82 and 82 to 97, which only contain 1 country each.

To inspect this further, we need to retrieve the data and inspect it.

To retrieve the data we can use pandas – read this tutorial for details or see the code below.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

print(pd.cut(table[index], 6).value_counts(sort=False))

If we inspect the output we see that our suggestion was right.

(6.93, 22.04]     25
(22.04, 37.06]    26
(37.06, 52.08]    22
(52.08, 67.1]      9
(67.1, 82.12]      1
(82.12, 97.14]     1
Name: Divorce_float, dtype: int64
The last two color codes are only used by one country each, while the first 3 are used by 20+ countries.

Step 2: Distribute the countries into evenly distributed bins

This requires to understand the difference between cut and qcut of pandas library.

  • cut By default will return the same size bins.
  • qcut Will by default try to return buckets with the same number of items in.

See this example to understand it better.

import pandas as pd

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

print(pd.qcut(table[index], 6).value_counts(sort=False))

Where the only difference is that we changed cut to qcut on the last line. This will result in the following output.

(7.018999999999999, 17.303]    14
(17.303, 23.957]               14
(23.957, 31.965]               14
(31.965, 40.0]                 15
(40.0, 47.078]                 13
(47.078, 97.14]                14
Name: Divorce_float, dtype: int64

Where we see that each bucket now contains approximately the same number of countries.

Hence, we need to use that for our purpose of color distribution our map.

Step 3: Putting it all together on the map

If you are new to folium and how make awesome leaflet maps easy, I can recommend to read this tutorial, or inspect the code below.

import pandas as pd
import folium
import geopandas
import numpy as np

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/Divorce_demography'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)

# The data is in the second table
table = tables[0]
# We need to remove level 0 of columns as they are disturbing the data
table.columns = table.columns.droplevel(0)

# We should clean the data
table['Country'] = table.apply(lambda row: row['Country/region'].split(' (')[0] if type(row['Country/region']) == str else row['Country/region'], axis=1)

# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Replace United States of America to United States to fit the naming in the table
world = world.replace('United States of America', 'United States')

# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])

def is_float(str):
    try:
        float(str)
        return True
    except:
        return False

# We need to convert the data to floats
index = 'Divorce_float'
table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)

# Clean data: remove rows with no data
table = table.dropna(subset=[index])

# We have 10 colors available resulting into 9 cuts.
bins_data = pd.qcut(table[index], 9).value_counts(sort=False)
print(bins_data)

bins = [0]
for i in range(9):
    bins.append(int(round(bins_data.index.values[i].right)))
bins[9] = 100

# Create a map
my_map = folium.Map()

# Add the data
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=['Country', index],
    key_on='feature.properties.name',
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name=index,
    threshold_scale=bins
).add_to(my_map)
my_map.save('divorse_rates.html')

Where we combine the two DataFrames and take advantage of that we have 10 colors available.

It should result in a map like this one.

Final output