Plot World Data to Map Using Python in 3 Easy Steps

What will we cover in this tutorial

  • As example we will use the html table from a wikipedia page. In this case the one listing countries by meat consumption.
  • We will see how to read the table data into a Pandas DataFrame with a single call.
  • Then how to merge it with a DataFrame containing data to color countries.
  • Finally, how to add the colors to leaflet map using a Python library.

Step 1: Read the data to a Pandas DataFrame

We need to inspect the page we are going to parse from. In this case it is the world meat consumption from wikipedia.

From wikipedia.

What we want to do is to gather the data from the table and plot it to a world map using colors to indicate the meat consumption.

End result

The easiest way to work with data is by using pandas DataFrames. The Pandas library has a read_html function, which returns all tables from a webpage.

This can be achieved by the following code. If you use read_html for the first time, you will need to instal lxml, see this tutorial for details.

import pandas as pd
# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]
print(table.head())

Resulting in the following output.

               Country  Kg/person (2002)[9][note 1] Kg/person (2009)[10]
0              Albania                         38.2                  NaN
1              Algeria                         18.3                 19.5
2       American Samoa                         24.9                 26.8
3               Angola                         19.0                 22.4
4  Antigua and Barbuda                         56.0                 84.3

Step 2: Merging the data to world map

The next step thing we want to do is to map it to a world map that we can color.

This can be done by using geopandas.

import pandas as pd
import geopandas

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]
print(table.head())
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
print(world.head())

Which results in the following output.

               Country  Kg/person (2002)[9][note 1] Kg/person (2009)[10]
0              Albania                         38.2                  NaN
1              Algeria                         18.3                 19.5
2       American Samoa                         24.9                 26.8
3               Angola                         19.0                 22.4
4  Antigua and Barbuda                         56.0                 84.3
     pop_est      continent                      name iso_a3  gdp_md_est                                           geometry
0     920938        Oceania                      Fiji    FJI      8374.0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1   53950935         Africa                  Tanzania    TZA    150600.0  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2     603253         Africa                 W. Sahara    ESH       906.5  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3   35623680  North America                    Canada    CAN   1674000.0  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4  326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Where we can see the column Country of the table DataFrame should be merged with the column name in the world DataFrame.

Let’s do the merge on that.

import pandas as pd
import geopandas

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])
print(table.head())

Which results in the following output.

     pop_est      continent  ... kg/person (2009)[10] kg/person (2017)[11]
0     920938        Oceania  ...                 38.8                  NaN
1   53950935         Africa  ...                  9.6                 6.82
2     603253         Africa  ...                  NaN                  NaN
3   35623680  North America  ...                 94.3                69.99
4  326625791  North America  ...                120.2                98.60
[5 rows x 10 columns]

Where we also notice that some rows do not have any data from table, resulting in values NaN. To get a clearer view we will remove those rows.

import pandas as pd
import geopandas

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])
# Clean data: remove rows with no data
table = table.dropna(subset=['kg/person (2002)[9][note 1]'])

The rows can be removed by using dropna.

Step 3: Add the data by colors on an interactive world map

Finally, you can use folium to create a leaflet map.

import pandas as pd
import folium
import geopandas

# The URL we will read our data from
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
# read_html returns a list of tables from the URL
tables = pd.read_html(url)
# The data is in the first table - this changes from time to time - wikipedia is updated all the time.
table = tables[0]
# Read the geopandas dataset
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Merge the two DataFrames together
table = world.merge(table, how="left", left_on=['name'], right_on=['Country'])
# Clean data: remove rows with no data
table = table.dropna(subset=['kg/person (2002)[9][note 1]'])
# Create a map
my_map = folium.Map()
# Add the data
folium.Choropleth(
    geo_data=table,
    name='choropleth',
    data=table,
    columns=['Country', 'kg/person (2002)[9][note 1]'],
    key_on='feature.properties.name',
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Meat consumption in kg/person'
).add_to(my_map)
my_map.save('meat.html')

Resulting a html webpage like this one.

Build a Financial Trading Algorithm in Python in 5 Easy Steps

What will we cover in this tutorial?

Step 1: Get time series data on your favorite stock

To build a financial trading algorithm in Python, it needs to be fed with data. Hence, the first step you need to master is how to collect time series data on your favorite stock. Sounds like it is difficult, right?

Luckily, someone made an awesome library pandas-datareader, which does all the hard word you for you. Let’s take a look on how easy it is.

import datetime as dt
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from dateutil.relativedelta import relativedelta

# A Danish jewellery manufacturer and retailer
stock = 'PNDORA.CO'
end = dt.datetime.now()
start = end - relativedelta(years=10)
pndora = pdr.get_data_yahoo(stock, start, end)
pndora['Close'].plot()
plt.show()

Which results in a time series of the closing price as shown here.

Pandora A/S time series stock price

The stock is probably quite unknown, considering it is a Danish company. But to prove a point that you can get data, including for Pandora.

In the code you see that you send a start and end date to the call fetching the data. Here we have taken the last 10 years. The object returned integrates well with the matplotlib library to make the plot.

To understand the data better, we need to explore further.

Step 2: Understand the data available

Let us explore the data object returned by the call (pndora).

To get an overview you should run the following code using the iloc-call to a the Dataframe object (pndora returned by pandas_datareader).

pndora.iloc[-1]

This will show what the last item of the object looks like.

High            365.000000
Low             355.600006
Open            360.000000
Close           356.500000
Volume       446004.000000
Adj Close       356.500000
Name: 2020-06-26 00:00:00, dtype: float64

Where you have the following items.

  • High: The highest price traded during that day.
  • Low: The lowest price traded during that day.
  • Open: The opening price that day.
  • Close: The closing price that day, that is the price of the last trade that day.
  • Volume: The number of shares that exchange hands for the stock that day.
  • Adj Close: It accurately reflect the stock’s value after accounting for any corporate actions. It is considered to be the true price of that stock and is often used when examining historical returns.

That means, it would be natural to use Adj Close in our calculations. Hence, for each day we have the above information available.

Step 3: Learning how to enrich the data (Pandas)

Pandas? Yes, you read correct. But not a Panda like this the picture.

Pandas the Python library.
Pandas the Python library.

There is an awesome library Pandas in Python to make data analysis easy.

Let us explore some useful things we can do.

import datetime as dt
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from dateutil.relativedelta import relativedelta

# A Danish jewellery manufacturer and retailer
stock = 'PNDORA.CO'
end = dt.datetime.now()
start = end - relativedelta(years=10)
pndora = pdr.get_data_yahoo(stock, start, end)
pndora['Short'] = pndora['Adj Close'].rolling(window=20).mean()
pndora['Long'] = pndora['Adj Close'].rolling(window=100).mean()
pndora[['Adj Close', 'Short', 'Long']].plot()
plt.show()

Which will result in the following graph.

Pandora A/S stock prices with Short mean average and Long mean average.
Pandora A/S stock prices with Short mean average and Long mean average.

If you inspect the code above, you see, that you easily added to two new columns (Short and Long) and computed them to be the mean value of the previous 20 and 100 days, respectively.

Further, you can add the daily percentage change for the various entries.

import datetime as dt
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from dateutil.relativedelta import relativedelta

# A Danish jewellery manufacturer and retailer
stock = 'PNDORA.CO'
end = dt.datetime.now()
start = end - relativedelta(years=10)
pndora = pdr.get_data_yahoo(stock, start, end)
pndora['Short'] = pndora['Adj Close'].rolling(window=20).mean()
pndora['Long'] = pndora['Adj Close'].rolling(window=100).mean()
pndora['Pct Change'] = pndora['Adj Close'].pct_change()
pndora['Pct Short'] = pndora['Short'].pct_change()
pndora['Pct Long'] = pndora['Long'].pct_change()
pndora[['Pct Change', 'Pct Short', 'Pct Long']].loc['2020'].plot()
plt.show()

Which will result in this graph.

Pandora A/S stock's daily percentage change for Adj Close, short and long mean.
Pandora A/S stock’s daily percentage change for Adj Close, short and long mean.

Again you can see how easy it is to add new columns in the Dataframe object provided by Pandas library.

Step 4: Building your strategy to buy and sell stocks

For the example we will keep it simple and only focus on one stock. The strategy we will use is called the dual moving average crossover.

Simply explained, you want to buy stocks when the short mean average is higher than the long mean average value.

In the figure above, it is translates to.

  • Buy when the yellow crosses above the green line.
  • Sell when the yellow crosses below the green line.

To implement the simplest version of that it would be as follows.

import datetime as dt
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import numpy as np
from dateutil.relativedelta import relativedelta

# A Danish jewellery manufacturer and retailer
stock = 'PNDORA.CO'
end = dt.datetime.now()
start = end - relativedelta(years=10)
pndora = pdr.get_data_yahoo(stock, start, end)
short_window = 20
long_window = 100
pndora['Short'] = pndora['Adj Close'].rolling(window=short_window).mean()
pndora['Long'] = pndora['Adj Close'].rolling(window=long_window).mean()
# Let us create some signals
pndora['signal'] = 0.0
pndora['signal'][short_window:] = np.where(pndora['Short'][short_window:] > pndora['Long'][short_window:], 1.0, 0.0)
pndora['positions'] = pndora['signal'].diff()

To visually see where to buy and sell you can use the following code afterwards on pndora.

fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Price')
pndora[['Adj Close', 'Short', 'Long']].plot(ax=ax1)
ax1.plot(pndora.loc[pndora.positions == 1.0].index,
         pndora.Short[pndora.positions == 1.0],
         '^', markersize=10, color='m')
ax1.plot(pndora.loc[pndora.positions == -1.0].index,
         pndora.Short[pndora.positions == -1.0],
         'v', markersize=10, color='k')
plt.show()

Which would result in the following graph.

Finally, you need to see how your algorithm performs.

Step 5: Testing you trading algorithm

There are many ways to test an algorithm. Here we go all in each cycle. We buy as much as we can and sell them all when we sell.

import datetime as dt
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import numpy as np
from dateutil.relativedelta import relativedelta

# A Danish jewellery manufacturer and retailer
stock = 'PNDORA.CO'
end = dt.datetime.now()
start = end - relativedelta(years=10)
pndora = pdr.get_data_yahoo(stock, start, end)
short_window = 20
long_window = 100
pndora['Short'] = pndora['Adj Close'].rolling(window=short_window).mean()
pndora['Long'] = pndora['Adj Close'].rolling(window=long_window).mean()
# Let us create some signals
pndora['signal'] = 0.0
pndora['signal'][short_window:] = np.where(pndora['Short'][short_window:] > pndora['Long'][short_window:], 1.0, 0.0)
pndora['positions'] = pndora['signal'].diff()
cash = 1000000
stocks = 0
for index, row in pndora.iterrows():
    if row['positions'] == 1.0:
        stocks = int(cash//row['Adj Close'])
        cash -= stocks*row['Adj Close']
    elif row['positions'] == -1.0:
        cash += stocks*row['Adj Close']
        stocks = 0
print("Total:", cash + stocks*pndora.iloc[-1]['Adj Close'])

Which results in.

Total: 2034433.8826065063

That is a double in 10 years, which is less than 8% per year. Not so good.

As this is one specific stock, it is not fair to judge the algorithm being poor, it can be the stock which was not performing good, or the variables can be further adjusted.

If compared with the scenario where you bought the stocks at day one and sold them on the last day, your earnings would be 1,876,232. Hence, the algorithm beats that.

Conclusion and further work

This is a simple financial trading algorithm in Python and there are variables that can be adjusted. The algorithm was performing better than the naive strategy to buy on day one and sell on the last day.

It could be interesting to add more data into the decision in the algorithm, which might be some future work to do. Also, can it be combined with some Machine Learning?