Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    How to Create Choropleth Maps with Evenly Distributed Colors in 3 Easy Steps

    What will we cover in the tutorial

    • You create your default map using Choropleth from folium (or any other) and the color distribution is poor. Most countries are distributed between two colors, which makes the map less informative.
    Example of poor color distribution.
    • Understand the issue
    • How to solve it
    • …and putting it all together.

    Step 1: Understand the issue

    In this example we have used the divorce rates in reported in various countries on wikipedia’s page of Divorce Demography

    At first inspection on the wikipedia page you get an idea of what the problem is.

    From wikipedia.org

    The divorce rate in Denmark is too high (just kidding, I am from Denmark and not proud that Denmark is ranking number 6 based on percentage of marriages that end in divorce).

    The issue is, that the distribution is not even. See the highest is Tunisia with 97.14 percent, then Portugal is second with 70.97 percent. This has to be compared to the color coding. The last two colors are by default distributed to 67 to 82 and 82 to 97, which only contain 1 country each.

    To inspect this further, we need to retrieve the data and inspect it.

    To retrieve the data we can use pandas – read this tutorial for details or see the code below.

    import pandas as pd
    # The URL we will read our data from
    url = 'https://en.wikipedia.org/wiki/Divorce_demography'
    # read_html returns a list of tables from the URL
    tables = pd.read_html(url)
    # The data is in the second table
    table = tables[0]
    # We need to remove level 0 of columns as they are disturbing the data
    table.columns = table.columns.droplevel(0)
    def is_float(str):
        try:
            float(str)
            return True
        except:
            return False
    # We need to convert the data to floats
    index = 'Divorce_float'
    table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)
    print(pd.cut(table[index], 6).value_counts(sort=False))
    

    If we inspect the output we see that our suggestion was right.

    (6.93, 22.04]     25
    (22.04, 37.06]    26
    (37.06, 52.08]    22
    (52.08, 67.1]      9
    (67.1, 82.12]      1
    (82.12, 97.14]     1
    Name: Divorce_float, dtype: int64
    
    The last two color codes are only used by one country each, while the first 3 are used by 20+ countries.
    

    Step 2: Distribute the countries into evenly distributed bins

    This requires to understand the difference between cut and qcut of pandas library.

    • cut By default will return the same size bins.
    • qcut Will by default try to return buckets with the same number of items in.

    See this example to understand it better.

    import pandas as pd
    # The URL we will read our data from
    url = 'https://en.wikipedia.org/wiki/Divorce_demography'
    # read_html returns a list of tables from the URL
    tables = pd.read_html(url)
    # The data is in the second table
    table = tables[0]
    # We need to remove level 0 of columns as they are disturbing the data
    table.columns = table.columns.droplevel(0)
    def is_float(str):
        try:
            float(str)
            return True
        except:
            return False
    # We need to convert the data to floats
    index = 'Divorce_float'
    table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)
    print(pd.qcut(table[index], 6).value_counts(sort=False))
    

    Where the only difference is that we changed cut to qcut on the last line. This will result in the following output.

    (7.018999999999999, 17.303]    14
    (17.303, 23.957]               14
    (23.957, 31.965]               14
    (31.965, 40.0]                 15
    (40.0, 47.078]                 13
    (47.078, 97.14]                14
    Name: Divorce_float, dtype: int64
    

    Where we see that each bucket now contains approximately the same number of countries.

    Hence, we need to use that for our purpose of color distribution our map.

    Step 3: Putting it all together on the map

    If you are new to folium and how make awesome leaflet maps easy, I can recommend to read this tutorial, or inspect the code below.

    import pandas as pd
    import folium
    import geopandas
    import numpy as np
    # The URL we will read our data from
    url = 'https://en.wikipedia.org/wiki/Divorce_demography'
    # read_html returns a list of tables from the URL
    tables = pd.read_html(url)
    # The data is in the second table
    table = tables[0]
    # We need to remove level 0 of columns as they are disturbing the data
    table.columns = table.columns.droplevel(0)
    # We should clean the data
    table['Country'] = table.apply(lambda row: row['Country/region'].split(' (')[0] if type(row['Country/region']) == str else row['Country/region'], axis=1)
    # Read the geopandas dataset
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    # Replace United States of America to United States to fit the naming in the table
    world = world.replace('United States of America', 'United States')
    # Merge the two DataFrames together
    table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])
    def is_float(str):
        try:
            float(str)
            return True
        except:
            return False
    # We need to convert the data to floats
    index = 'Divorce_float'
    table[index] = table.apply(lambda row: float(row['Percent']) if is_float(row['Percent']) else np.nan, axis=1)
    # Clean data: remove rows with no data
    table = table.dropna(subset=[index])
    # We have 10 colors available resulting into 9 cuts.
    bins_data = pd.qcut(table[index], 9).value_counts(sort=False)
    print(bins_data)
    bins = [0]
    for i in range(9):
        bins.append(int(round(bins_data.index.values[i].right)))
    bins[9] = 100
    # Create a map
    my_map = folium.Map()
    # Add the data
    folium.Choropleth(
        geo_data=table,
        name='choropleth',
        data=table,
        columns=['Country', index],
        key_on='feature.properties.name',
        fill_color='OrRd',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name=index,
        threshold_scale=bins
    ).add_to(my_map)
    my_map.save('divorse_rates.html')
    

    Where we combine the two DataFrames and take advantage of that we have 10 colors available.

    It should result in a map like this one.

    Final output

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment