Pandas + GeoPandas + OpenCV: Create a Video of COVID-19 World Map

What will we cover?

How to create a video like the one below using Pandas + GeoPandas + OpenCV in Python.

  1. How to collect newest COVID-19 data in Python using Pandas.
  2. Prepare data and calculate values needed to create Choropleth map
  3. Get Choropleth map from GeoPandas and prepare to combine it
  4. Get the data frame by frame to the video
  5. Combine it all to a video using OpenCV

Step 1: Get the daily reported COVID-19 data world wide

This data is available from the European Centre for Disease Prevention and Control and can be found here.

All we need is to download the csv file, which has all the historic data from all the reported countries.

This can be done as follows.

import pandas as pd


# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data
table = pd.read_csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv")

print(table)

This will give us an idea of how the data is structured.

          dateRep  day  month  year  cases  deaths countriesAndTerritories geoId countryterritoryCode  popData2019 continentExp  Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0      01/10/2020    1     10  2020     14       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.040961         
1      30/09/2020   30      9  2020     15       2             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.048847         
2      29/09/2020   29      9  2020     12       3             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.114565         
3      28/09/2020   28      9  2020      0       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.343261         
4      27/09/2020   27      9  2020     35       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.540413         
...           ...  ...    ...   ...    ...     ...                     ...   ...                  ...          ...          ...                                                ...         
46221  25/03/2020   25      3  2020      0       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN         
46222  24/03/2020   24      3  2020      0       1                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN         
46223  23/03/2020   23      3  2020      0       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN         
46224  22/03/2020   22      3  2020      1       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN         
46225  21/03/2020   21      3  2020      1       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN         

[46226 rows x 12 columns]

First we want to convert the dateRep to a date object (cannot be seen in the above, but the dates are represented by a string). Then use that as index for easier access later.

import pandas as pd


# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data
table = pd.read_csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv")

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')

Step 2: Prepare data and compute values needed for plot

What makes sense to plot?

Good question. In a Choropleth map you will color according to a value. Here we will color in darker red the higher the value a country is represented with.

If we plotted based on number new COVID-19 cases, this would be high for countries with high populations. Hence, the number of COVID-19 cases per 100,000 people is used.

Using new COVID-19 cases per 100,000 people can be volatile and change drastic from day to day. To even that out, a 7 days rolling sum can be used. That is, you take the sum of the last 7 days and continue that process through your data.

To make it even less volatile, the average of the last 14 days of the 7 days rolling sum is used.

And no, it is not just something invented by me. It is used by the authorities in my home country to calculate rules of which countries are open for travel or not.

This can by the data above be calculated by computing that data.

def get_stat(country_code, table):
    data = table.loc[table['countryterritoryCode'] == country_code]
    data = data.reindex(index=data.index[::-1])
    data['7 days sum'] = data['cases'].rolling(7).sum()
    data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
    data['14 mean'] = data['7ds/100000'].rolling(14).mean()
    return data

The above function takes the table we returned from Step 1 and extract a country based on a country code. Then it reverses the data to have the dates in chronological order.

After that, it computes the 7 days rolling sum. Then computes the new cases by the population in the country in size of 100,000 people. Finally, it computes the 14 days average (mean) of it.

Step 3: Get the Choropleth map data and prepare it

GeoPandas is an amazing library to create Choropleth maps. But it does need your attention when you combine it with other data.

Here we want to combine it with the country codes (ISO_A3). If you inspect the data, some of the countries are missing that data.

Other than that the code is straight forward.

import pandas as pd
import geopandas


# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data
table = pd.read_csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv")

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')


def get_stat(country_code, table):
    data = table.loc[table['countryterritoryCode'] == country_code]
    data = data.reindex(index=data.index[::-1])
    data['7 days sum'] = data['cases'].rolling(7).sum()
    data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
    data['14 mean'] = data['7ds/100000'].rolling(14).mean()
    return data


# Read the data to make a choropleth map
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est > 0) & (world.name != "Antarctica")]

# Store data per country to make it easier
data_by_country = {}

for index, row in world.iterrows():
    # The world data is not fully updated with ISO_A3 names
    if row['iso_a3'] == '-99':
        country = row['name']
        if country == "Norway":
            world.at[index, 'iso_a3'] = 'NOR'
            row['iso_a3'] = "NOR"
        elif country == "France":
            world.at[index, 'iso_a3'] = 'FRA'
            row['iso_a3'] = "FRA"
        elif country == 'Kosovo':
            world.at[index, 'iso_a3'] = 'XKX'
            row['iso_a3'] = "XKX"
        elif country == "Somaliland":
            world.at[index, 'iso_a3'] = '---'
            row['iso_a3'] = "---"
        elif country == "N. Cyprus":
            world.at[index, 'iso_a3'] = '---'
            row['iso_a3'] = "---"

    # Add the data for the country
    data_by_country[row['iso_a3']] = get_stat(row['iso_a3'], table)

This will create a dictionary (data_by_country) with the needed data for each country. Notice, we do it like this, because not all countries have the same number of data points.

Step 4: Create a Choropleth map for each date and save it as an image

This can be achieved by using matplotlib.

The idea is to go through all dates and look for each country if they have data for that date and use it if they have.

import pandas as pd
import geopandas
import matplotlib.pyplot as plt


# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data
table = pd.read_csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv")

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')


def get_stat(country_code, table):
    data = table.loc[table['countryterritoryCode'] == country_code]
    data = data.reindex(index=data.index[::-1])
    data['7 days sum'] = data['cases'].rolling(7).sum()
    data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
    data['14 mean'] = data['7ds/100000'].rolling(14).mean()
    return data


# Read the data to make a choropleth map
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world = world[(world.pop_est > 0) & (world.name != "Antarctica")]

# Store data per country to make it easier
data_by_country = {}

for index, row in world.iterrows():
    # The world data is not fully updated with ISO_A3 names
    if row['iso_a3'] == '-99':
        country = row['name']
        if country == "Norway":
            world.at[index, 'iso_a3'] = 'NOR'
            row['iso_a3'] = "NOR"
        elif country == "France":
            world.at[index, 'iso_a3'] = 'FRA'
            row['iso_a3'] = "FRA"
        elif country == 'Kosovo':
            world.at[index, 'iso_a3'] = 'XKX'
            row['iso_a3'] = "XKX"
        elif country == "Somaliland":
            world.at[index, 'iso_a3'] = '---'
            row['iso_a3'] = "---"
        elif country == "N. Cyprus":
            world.at[index, 'iso_a3'] = '---'
            row['iso_a3'] = "---"

    # Add the data for the country
    data_by_country[row['iso_a3']] = get_stat(row['iso_a3'], table)

# Create an image per date
for day in pd.date_range('12-31-2019', '10-01-2020'):
    print(day)
    world['number'] = 0.0
    for index, row in world.iterrows():
        if day in data_by_country[row['iso_a3']].index:
            world.at[index, 'number'] = data_by_country[row['iso_a3']].loc[day]['14 mean']

    world.plot(column='number', legend=True, cmap='OrRd', figsize=(15, 5))
    plt.title(day.strftime("%Y-%m-%d"))
    plt.savefig(f'image-{day.strftime("%Y-%m-%d")}.png')
    plt.close()

This will create an image for each day. These images will be combined.

Step 5: Create a video from images with OpenCV

Using OpenCV to create a video from a sequence of images is quite easy. The only thing you need to ensure is that it reads the images in the correct order.

import cv2
import glob

img_array = []
filenames = glob.glob('image-*.png')
filenames.sort()
for filename in filenames:
    print(filename)
    img = cv2.imread(filename)
    height, width, layers = img.shape
    size = (width, height)
    img_array.append(img)

out = cv2.VideoWriter('covid.avi', cv2.VideoWriter_fourcc(*'DIVX'), 15, size)

for i in range(len(img_array)):
    out.write(img_array[i])
out.release()

Where we use the VideoWriter from OpenCV.

This results in this video.

Leave a Reply