What will we cover in this tutorial?
- We will gather data from wikipedia.org List of countries by past and projected GDP using pandas.
- First step will be get the data and merge the correct tables together.
- Next step is using Machine Learning with Linear regression model to estimate the growth of each country GDP.
- Final step is to visualize the growth rates on a leaflet map using folium.
Step 1: Get the data and merge it
The data is available on wikipedia on List of countries by past and projected GDP. We will focus on data from 1990 to 2019.
At first glance on the page you notice that the date is not gathered in one table.

The first task will be to merge the three tables with the data from 1990-1999, 2000-2009, and 2010-2019.
The data can be collected by pandas read_html function. If you are new to this you can read this tutorial.
import pandas as pd
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
print(table)
The call to read_html will return all the tables in a list. By inspecting the results you will notice that we are interested in table 9, 12 and 15 and merge them. The output of the above will be.
Country (or dependent territory) 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
0 Afghanistan NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4367.0 4514.0 5146.0 6167.0 6925.0 8556.0 10297.0 12066.0 15325.0 17890.0 20296.0 20170.0 20352.0 19687.0 19454.0 20235.0 19585.0 19990.0
1 Albania 2221.0 1333.0 843.0 1461.0 2361.0 2882.0 3200.0 2259.0 2560.0 3209.0 3483.0 3928.0 4348.0 5611.0 7185.0 8052.0 8905.0 10675.0 12901.0 12093.0 11938.0 12896.0 12323.0 12784.0 13238.0 11393.0 11865.0 13055.0 15202.0 15960.0
2 Algeria 61892.0 46670.0 49217.0 50963.0 42426.0 42066.0 46941.0 48178.0 48188.0 48845.0 54749.0 54745.0 56761.0 67864.0 85327.0 103198.0 117027.0 134977.0 171001.0 137054.0 161207.0 199394.0 209005.0 209703.0 213518.0 164779.0 159049.0 167555.0 180441.0 183687.0
3 Angola 11236.0 10891.0 8398.0 6095.0 4438.0 5539.0 6535.0 7675.0 6506.0 6153.0 9130.0 8936.0 12497.0 14189.0 19641.0 28234.0 41789.0 60449.0 84178.0 75492.0 82471.0 104116.0 115342.0 124912.0 126777.0 102962.0 95337.0 122124.0 107316.0 92191.0
4 Antigua and Barbuda 459.0 482.0 499.0 535.0 589.0 577.0 634.0 681.0 728.0 766.0 825.0 796.0 810.0 850.0 912.0 1013.0 1147.0 1299.0 1358.0 1216.0 1146.0 1140.0 1214.0 1194.0 1273.0 1353.0 1460.0 1516.0 1626.0 1717.0
5 Argentina 153205.0 205515.0 247987.0 256365.0 279150.0 280080.0 295120.0 317549.0 324242.0 307673.0 308491.0 291738.0 108731.0 138151.0 164922.0 199273.0 232892.0 287920.0 363545.0 334633.0 424728.0 527644.0 579666.0 611471.0 563614.0 631621.0 554107.0 642928.0 518092.0 477743.0
6 Armenia NaN NaN 108.0 835.0 648.0 1287.0 1597.0 1639.0 1892.0 1845.0 1912.0 2118.0 2376.0 2807.0 3577.0 4900.0 6384.0 9206.0 11662.0 8648.0 9260.0 10142.0 10619.0 11121.0 11610.0 10529.0 10572.0 11537.0 12411.0 13105.0
Step 2: Use linear regression to estimate the growth over the last 30 years
In this section we will use Linear regression from the scikit-learn library, which is a simple prediction tool.
If you are new to Machine Learning we recommend you read this tutorial on Linear regression.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
row = table.iloc[1]
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
Y_pred = regr.predict(X)
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()
Which will result in the following plot.

Which shows that the model approximates a line through the 30 years of data to estimate the growth of the country’s GDP.
Notice that we use the product (cumprod) of pct_change to be able to compare the data. If we used the data directly, we would not be possible to compare it.
We will do that for all countries to get a view of the growth. We are using the coefficient of the line, which indicates the growth rate.
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
coef = []
countries = []
for index, row in table.iterrows():
#print(row)
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
coef.append(regr.coef_[0][0])
countries.append(row[merge_index])
data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])
print(data)
Which results in the following output (or the first few lines).
Country Coef
0 Afghanistan 0.161847
1 Albania 0.243493
2 Algeria 0.103907
3 Angola 0.423919
4 Antigua and Barbuda 0.087863
5 Argentina 0.090837
6 Armenia 4.699598
Step 3: Merge the data to a leaflet map using folium
The last step is to merge the data together with the leaflet map using the folium library. If you are new to folium we recommend you read this tutorial.
import pandas as pd
import folium
import geopandas
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
coef = []
countries = []
for index, row in table.iterrows():
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
coef.append(regr.coef_[0][0])
countries.append(row[merge_index])
data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Replace United States of America to United States to fit the naming in the table
world = world.replace('United States of America', 'United States')
# Merge the two DataFrames together
table = world.merge(data, how="left", left_on=['name'], right_on=['Country'])
# Clean data: remove rows with no data
table = table.dropna(subset=['Coef'])
# We have 10 colors available resulting into 9 cuts.
table['Cat'] = pd.qcut(table['Coef'], 9, labels=[0, 1, 2, 3, 4, 5, 6, 7, 8])
print(table)
# Create a map
my_map = folium.Map()
# Add the data
folium.Choropleth(
geo_data=table,
name='choropleth',
data=table,
columns=['Country', 'Cat'],
key_on='feature.properties.name',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Growth of GDP since 1990',
threshold_scale=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
).add_to(my_map)
my_map.save('gdp_growth.html')
There is a twist in the way it is done. Instead of using a linear model to represent the growth rate on the map, we chose to add them in categories. The reason is that otherwise most countries group in small segment.
Here we have used the qcut to add them in each equal sized group.
This should result in an interactive html page looking something like this.

Learn Python

Learn Python A BEGINNERS GUIDE TO PYTHON
- 70 pages to get you started on your journey to master Python.
- How to install your setup with Anaconda.
- Written description and introduction to all concepts.
- Jupyter Notebooks prepared for 17 projects.
Python 101: A CRASH COURSE
- How to get started with this 8 hours Python 101: A CRASH COURSE.
- Best practices for learning Python.
- How to download the material to follow along and create projects.
- A chapter for each lesson with a description, code snippets for easy reference, and links to a lesson video.
Expert Data Science Blueprint

Expert Data Science Blueprint
- Master the Data Science Workflow for actionable data insights.
- How to download the material to follow along and create projects.
- A chapter to each lesson with a Description, Learning Objective, and link to the lesson video.
Machine Learning

Machine Learning – The Simple Path to Mastery
- How to get started with Machine Learning.
- How to download the material to follow along and make the projects.
- One chapter for each lesson with a Description, Learning Objectives, and link to the lesson video.