What will we cover in this tutorial?
- We will gather data from wikipedia.org List of countries by past and projected GDP using pandas.
- First step will be get the data and merge the correct tables together.
- Next step is using Machine Learning with Linear regression model to estimate the growth of each country GDP.
- Final step is to visualize the growth rates on a leaflet map using folium.
Step 1: Get the data and merge it
The data is available on wikipedia on List of countries by past and projected GDP. We will focus on data from 1990 to 2019.
At first glance on the page you notice that the date is not gathered in one table.

The first task will be to merge the three tables with the data from 1990-1999, 2000-2009, and 2010-2019.
The data can be collected by pandas read_html function. If you are new to this you can read this tutorial.
import pandas as pd
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
print(table)
The call to read_html will return all the tables in a list. By inspecting the results you will notice that we are interested in table 9, 12 and 15 and merge them. The output of the above will be.
Country (or dependent territory) 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
0 Afghanistan NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4367.0 4514.0 5146.0 6167.0 6925.0 8556.0 10297.0 12066.0 15325.0 17890.0 20296.0 20170.0 20352.0 19687.0 19454.0 20235.0 19585.0 19990.0
1 Albania 2221.0 1333.0 843.0 1461.0 2361.0 2882.0 3200.0 2259.0 2560.0 3209.0 3483.0 3928.0 4348.0 5611.0 7185.0 8052.0 8905.0 10675.0 12901.0 12093.0 11938.0 12896.0 12323.0 12784.0 13238.0 11393.0 11865.0 13055.0 15202.0 15960.0
2 Algeria 61892.0 46670.0 49217.0 50963.0 42426.0 42066.0 46941.0 48178.0 48188.0 48845.0 54749.0 54745.0 56761.0 67864.0 85327.0 103198.0 117027.0 134977.0 171001.0 137054.0 161207.0 199394.0 209005.0 209703.0 213518.0 164779.0 159049.0 167555.0 180441.0 183687.0
3 Angola 11236.0 10891.0 8398.0 6095.0 4438.0 5539.0 6535.0 7675.0 6506.0 6153.0 9130.0 8936.0 12497.0 14189.0 19641.0 28234.0 41789.0 60449.0 84178.0 75492.0 82471.0 104116.0 115342.0 124912.0 126777.0 102962.0 95337.0 122124.0 107316.0 92191.0
4 Antigua and Barbuda 459.0 482.0 499.0 535.0 589.0 577.0 634.0 681.0 728.0 766.0 825.0 796.0 810.0 850.0 912.0 1013.0 1147.0 1299.0 1358.0 1216.0 1146.0 1140.0 1214.0 1194.0 1273.0 1353.0 1460.0 1516.0 1626.0 1717.0
5 Argentina 153205.0 205515.0 247987.0 256365.0 279150.0 280080.0 295120.0 317549.0 324242.0 307673.0 308491.0 291738.0 108731.0 138151.0 164922.0 199273.0 232892.0 287920.0 363545.0 334633.0 424728.0 527644.0 579666.0 611471.0 563614.0 631621.0 554107.0 642928.0 518092.0 477743.0
6 Armenia NaN NaN 108.0 835.0 648.0 1287.0 1597.0 1639.0 1892.0 1845.0 1912.0 2118.0 2376.0 2807.0 3577.0 4900.0 6384.0 9206.0 11662.0 8648.0 9260.0 10142.0 10619.0 11121.0 11610.0 10529.0 10572.0 11537.0 12411.0 13105.0
Step 2: Use linear regression to estimate the growth over the last 30 years
In this section we will use Linear regression from the scikit-learn library, which is a simple prediction tool.
If you are new to Machine Learning we recommend you read this tutorial on Linear regression.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
row = table.iloc[1]
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
Y_pred = regr.predict(X)
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()
Which will result in the following plot.

Which shows that the model approximates a line through the 30 years of data to estimate the growth of the country’s GDP.
Notice that we use the product (cumprod) of pct_change to be able to compare the data. If we used the data directly, we would not be possible to compare it.
We will do that for all countries to get a view of the growth. We are using the coefficient of the line, which indicates the growth rate.
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
coef = []
countries = []
for index, row in table.iterrows():
#print(row)
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
coef.append(regr.coef_[0][0])
countries.append(row[merge_index])
data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])
print(data)
Which results in the following output (or the first few lines).
Country Coef
0 Afghanistan 0.161847
1 Albania 0.243493
2 Algeria 0.103907
3 Angola 0.423919
4 Antigua and Barbuda 0.087863
5 Argentina 0.090837
6 Armenia 4.699598
Step 3: Merge the data to a leaflet map using folium
The last step is to merge the data together with the leaflet map using the folium library. If you are new to folium we recommend you read this tutorial.
import pandas as pd
import folium
import geopandas
from sklearn.linear_model import LinearRegression
import numpy as np
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# Merge the tables into one table
merge_index = 'Country (or dependent territory)'
table = tables[9].merge(tables[12], how="left", left_on=[merge_index], right_on=[merge_index])
table = table.merge(tables[15], how="left", left_on=[merge_index], right_on=[merge_index])
coef = []
countries = []
for index, row in table.iterrows():
X = table.columns[1:].to_numpy().reshape(-1, 1)
X = X.astype(int)
Y = 1 + row.iloc[1:].pct_change()
Y = Y.cumprod().fillna(1.0).to_numpy()
Y = Y.reshape(-1, 1)
regr = LinearRegression()
regr.fit(X, Y)
coef.append(regr.coef_[0][0])
countries.append(row[merge_index])
data = pd.DataFrame(list(zip(countries, coef)), columns=['Country', 'Coef'])
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Replace United States of America to United States to fit the naming in the table
world = world.replace('United States of America', 'United States')
# Merge the two DataFrames together
table = world.merge(data, how="left", left_on=['name'], right_on=['Country'])
# Clean data: remove rows with no data
table = table.dropna(subset=['Coef'])
# We have 10 colors available resulting into 9 cuts.
table['Cat'] = pd.qcut(table['Coef'], 9, labels=[0, 1, 2, 3, 4, 5, 6, 7, 8])
print(table)
# Create a map
my_map = folium.Map()
# Add the data
folium.Choropleth(
geo_data=table,
name='choropleth',
data=table,
columns=['Country', 'Cat'],
key_on='feature.properties.name',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Growth of GDP since 1990',
threshold_scale=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
).add_to(my_map)
my_map.save('gdp_growth.html')
There is a twist in the way it is done. Instead of using a linear model to represent the growth rate on the map, we chose to add them in categories. The reason is that otherwise most countries group in small segment.
Here we have used the qcut to add them in each equal sized group.
This should result in an interactive html page looking something like this.

Python Circle
Do you know what the 5 key success factors every programmer must have?
How is it possible that some people become programmer so fast?
While others struggle for years and still fail.
Not only do they learn python 10 times faster they solve complex problems with ease.
What separates them from the rest?
I identified these 5 success factors that every programmer must have to succeed:
- Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
- Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
- Support: receive feedback on your work and ask questions without feeling intimidated or judged.
- Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
- Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.
I know how important these success factors are for growth and progress in mastering Python.
That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.
With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

Be part of something bigger and join the Python Circle community.