Birthday Paradox and Hash Function Collisions by Example

What will we cover in this tutorial?

We will look at how the Birthday Paradox is used when estimating how collision resistance a hash function is. This tutorial will show that a good estimate is that a n-bit hash function will have collision by chance with n/2-bit random hash values.

Step 1: Understand a hash function

A hash function is a one-way function with a fixed output size. That is, the output has the same size and it is difficult to find two distinct input chucks, which give the same output.

hash function is any function that can be used to map data of arbitrary size to fixed-size values. 

https://en.wikipedia.org/wiki/Hash_function

Probably the best know example of a hash-function is the MD5. It was designed to be used as a cryptographic hash function, but has been found to have many vulnerabilities.

Does this mean you should not use the MD5 hash function?

That depends. If you use it in a cryptographic setup, the answer is Do not use.

On the other hand, hash function are often used to calculate identifiers. For that purpose, it also depends if you should use it or not.

This is where the Birthday Paradox comes in.

Step 2: How are hash functions and the Birthday Paradox related?

Good question. First recall what the Birthday Paradox states.

…in a random group of 23 people, there is about a 50 percent chance that two people have the same birthday

https://www.learnpythonwithrune.org/birthday-paradox-by-example-it-is-not-a-paradox/

How can that be related to hash functions? There is something about collisions, right?

Given 23 people, we have 50% chance of collision (two people with the same birthday).

Hence, if we have that our hash functions maps data to a day in the calendar year. That is, it maps hash(data) -> [0, 364], then given 23 hash values, we have 50% chance for collision.

But you also know that our hash function maps to more than 365 distinct values. Actually, the MD5 maps to 2^128 distinct values.

An example would be appreciated now. Let us make a simplified hash function, call it MD5′ (md5-prime), which maps like the MD5, but only uses the first byte of the result.

That is, we have MD5′(data) -> [0, 255].

Surely, by the pigeonhole principle we would run out of possible values after 256 distinct data input to MD5′ and have a collision.

import hashlib
import os


lookup_table = {}
collision_count = 0
for _ in range(256):
    random_binary = os.urandom(16)
    result = hashlib.md5(random_binary).digest()
    result = result[:1]
    if result in lookup_table:
        print("Collision")
        print(random_binary, result)
        print(lookup_table[result], result)
        collision_count += 1
    else:
        lookup_table[result] = random_binary

print("Number of collisions:", collision_count)

The lookup_table is used to store the already seen hash values. We will iterate over the 256 (one less than possible values of our MD5′ hash function). Take some random data and hash it with md5 and only use first byte (8 bits). If result already exists in lookup_table we have a collision, otherwise add it to our lookup_table.

For a random run of this I got 87 collisions. Expected? I would say so.

Let us try to use the Birthday Paradox to estimate how many hash values we need to get a collision of our MD5′ hash function.

A rough estimate that is widely used, is that the square root of the number of possible outcomes will give a 50% chance of collision (see wikipedia for approximation).

That is, for MD5′(data) -> [0, 255] it is, sqrt(256) = 16. Let’s try that.

import hashlib
import os


collision = 0
for _ in range(1000):
    lookup_table = {}
    for _ in range(16):
        random_binary = os.urandom(16)
        result = hashlib.md5(random_binary).digest()
        result = result[:1]
        if result not in lookup_table:
            lookup_table[result] = random_binary
        else:
            collision += 1
            break

print("Number of collisions:", collision, "out of", 1000)

Which gives some like this.

Number of collisions: 391 out of 1000

That is in the lower end, but still a reasonable approximation.

Step 3: Use a correct data structure to lookup in

Just to clarify. We will not find collisions on the full MD5 hash function, but we will try to see if the estimate of collision is reasonable.

This requires to do a lot of calculations and we want to ensure that we are not having a bottleneck with using a wrong data structure.

The Python dict should be a hash table with expected insert and lookup O(1). Still the worst case is O(n) for these operations, which would be a big overhead to cary along the way. Hence, we will first test, that the dictionary has O(1) insert and lookup time for the use cases we have of it here.

import time
import matplotlib.pyplot as plt



def dict_size(size):
    start = time.time()
    dict = {}
    for i in range(size):
        if i in dict:
            print("HIT")
        else:
            dict[i] = 0

    return time.time() - start


x = []
y = []
for i in range(0, 2**20, 2**12):
    performance = dict_size(i)
    x.append(i)
    y.append(performance)

plt.scatter(x, y, alpha=0.1)
plt.xlabel("Size")
plt.ylabel("Time (sec)")
plt.show()

Resulting in something like this.

What does that tell us? That the dict in Python has a approximately linear insert and lookup time, that is O(1). But there some overhead at some sizes, e.g. a bit before 3,000,000. It is not exactly linear, but close enough not to expect a exponential run time.

This step is not necessary, but it is nice to know how the function grows in time, when we want to check for collisions. If the above time complexity grew exponentially (or not linearly), then it can suddenly become hard to estimate the runtime if we run for a bigger space.

Step 4: Validating if square root of the bit size is a good estimate for collision

We will continue our journey with our modified MD5′ hash function, where the output space will be reduced.

We will then for various output space sizes see if the estimate for 50% collision of the hash functions is decent. That is, if we need approximately sqrt(space_size) of hash values to have an approximately 50% chance of a collision.

This can be done by the following code.

import hashlib
import os
import time
import matplotlib.pyplot as plt


def main(bit_range):
    start = time.time()
    collision_count = 0
    # Each space_size counts for 4 bits, hence we have
    space_size = bit_range//4
    for _ in range(100):
        lookup_table = {}
        # Searching half the sqrt of the space for collision
        # sqrt(2**bit_range) = 2**(bit_range//2)
        for _ in range(2**(bit_range//2)):
            random_binary = os.urandom(16)
            result = hashlib.md5(random_binary).hexdigest()
            result = result[:space_size]
            if result in lookup_table:
                collision_count += 1
                break
            else:
                lookup_table[result] = random_binary

    return time.time() - start, collision_count


x = []
y1 = []
y2 = []
for i in range(4, 44, 4):
    performance, count = main(i)
    x.append(i)
    y1.append(performance)
    y2.append(count)

_, ax1 = plt.subplots()
plt.xlabel("Size")
plt.ylabel("Time (sec)")
ax1.scatter(x, y1)
ax2 = ax1.twinx()
ax2.bar(x, y2, align='center', alpha=0.5, color='red')
ax2.set_ylabel("Collision rate (%)", color='red')
ax2.set_ylim([0, 100])

plt.show()

The estimated collision rate is very rough, as it only runs 100 trials for each space size.

The result are shown in the graph below.

Interestingly, it seems to be in the 30-50% range for most cases.

As a note, it might confuse that the run-time (the dots), does not seem to be linear. That is because for each bit-size we increase, we double the space. Hence, the x-axis is a logarithmic scale.

Step 5: What does that all mean?

This has high impact on using hash functions for creating unique identifiers. If you want a short identifier with the least number of bits, then you need to consider the Birthday Paradox.

Assume you created the following service.

import hashlib
import base64


def get_uid(text):
    result = hashlib.md5(text.encode()).digest()
    result = base64.b64encode(result)
    return result[:4]


uid = get_uid("my text")
print(uid)

If the input text can be considered random, how resistant is get_uid(…) function against collision.

Well, it returns 4 base64 characters. That is 6*4 = 24 bits of information (each base 64 character contains 6 bits of information). The rough estimate is that if you use it sprt(2^24) = 2^12 = 4,096 times you will have a high risk of collision (approximately 50% chance).

Let’s try.

import hashlib
import os
import base64


def get_uid(text):
    result = hashlib.md5(text).digest()
    result = base64.b64encode(result)
    return result[:4]


lookup_table = {}
for _ in range(4096):
    text = os.urandom(16)
    uid = get_uid(text)
    if uid in lookup_table:
        print("Collision detected")
    else:
        lookup_table[uid] = text

It does not give collision every time, but run it a few times and you will get.

Collision detected

Hence, it seems to be valid. The above code was run 1000 times and gave collision 497 times, which is close to 50% of the time.

Birthday Paradox by Example – it is not a Paradox

The Birthday Paradox is presented as follows.

…in a random group of 23 people, there is about a 50 percent chance that two people have the same birthday

Birthday Paradox

This is also referred to as the Birthday Problem in probability theory.

First question: What is a paradox?

…is a logically self-contradictory statement or a statement that runs contrary to one’s expectation

Wikipedia

What does that mean? A logically self-contradictory statement‚ means that there should be a contradiction somewhere in the Birthday Paradox. This is not the case.

Then a statement that runs contrary to one’s expectations, could be open for discussion. As we will see, by example, in this post, it is not contrary to one’s expectation for an informed person.

Step 1: Run some examples

The assumption is that we have 23 random people. This assumes further, that the birthday of each one of these people is random.

To validate that this is true, let’s try to implement it in Python.

import random

stat = {'Collision': 0, 'No-collision': 0}

for _ in range(10000):
    days = []
    for _ in range(23):
        day = random.randint(0, 365)
        days.append(day)

    if len(days) == len(set(days)):
        stat['No-collision'] += 1
    else:
        stat['Collision'] += 1

print("Probability for at least 2 with same birthday in a group of 23")
print("P(A) =", stat['Collision']/(stat['Collision'] + stat['No-collision']))

This will output different results from run to run, but something around 0.507.

Probability for at least 2 with same birthday in a group of 23
P(A) = 0.5026

A few comments to the code. It keeps record of how many times of choosing 23 random birthdays, we will end with at least two of them being the same day. We run the experiment 10,000 times to have some idea if it is just pure luck.

The check if len(days) == len(set(days)) tests whether we did not have the same brirthday. If function set(…) takes all the unique days in the list. Hence, if we have two the same days days of the year, then the len (length) will be the same for the list and the set of days.

Step 2: The probability theory behind it

This is where it becomes a bit more technical. The above shows it behaves like it says. That if we take a group of 23 random people, with probability 50%, two of them will have the same birthday.

Is this contrary to one’s expectations? Hence, is it a paradox?

Before we answer that, let’s see if we can nail the probability theory behind this.

Do it step by step.

If we have 1 person, what is the probability that anyone in this group of 1 person has the same birthday? Yes, it sounds strange. The probability is obviously 0.

If we have 2 persons, what is the probability that any of the 2 people have the same birthday? Then they need to have the same birthday. Hence, the probability become 1/365.

How do you write that as an equation?

What we often do in probability theory, is, that we calculate the opposite probability.

Hence, we calculate the probability of now having two the same birthdays in a group. This is easier to calculate. In the first case, we have all possibilities open.

P(1) = 1

Where P(1) is the probability that given a group of one person, what is the probability of that person not having the same birthday as anyone in the group.

P(2) = 1 x (364 / 365)

Because, the first birthday is open for any birthday, then the second, only has 364 left of 365 possible birthdays.

This continues.

P(n) = 1 x (364 / 365) x (363 / 365) x … x ((365 – n + 1) / 365)

Which makes the probability of picking 23 random people without anyone with the same birthday to be.

P(23) = 1 x (364 / 365) x (363 / 365) x … x (343 / 365) = 0.493

Or calculated in Python.

def prop(n):
    if n == 1:
        return 1
    else:
        return (365 - n + 1) / 365 * prop(n - 1)

print("Probability for at no-one with same birthday in a group of 23")
print("P(A') =",  prop(23))

Which results in.

Probability for at no-one with same birthday in a group of 23
P(A') = 0.4927027656760144

This formula can be rewritten (see wikipedia), but for our purpose the above is fine for our purpose.

The probability we look for is given by.

P(A) = 1 – P(A’)

Step 3: Make a graph of how likely a collision is based on a group size

This is great news. We can now calculate the theoretical probability of two people having the same birthday in a group of n random people.

This can be achieved by the following code.

from matplotlib import pyplot as plt

def prop(n):
    if n == 1:
        return 1
    else:
        return (365 - n + 1) / 365 * prop(n - 1)

X = []
Y = []
for i in range(1, 90):
    X.append(i)
    Y.append(1 - prop(i))

plt.scatter(X, Y, color='blue')
plt.xlabel("Number of people")
plt.ylabel("Probability of collision")
plt.axis([0, 90, 0, 1])
plt.show()

Which results in the following plot.

Where you can see that about 23 people, we have 50% chance of having a pair with the same birthday (called collision).

Conclusion

Is it a paradox? Well, there is no magic in it. You can see the above are just simple calculations. But is the following contrary to one’s expectation?

6 weeks are 6*7*24*60*60 seconds = 3,628,800 seconds.

And 10! = 10*9*8*7*6*5*4*3*2*1 = 3,628,800.

Well, the first time you calculate it might be. But does that make it a paradox?

No, it is just a surprising fact the first time you see it. Does it mean that seconds are related to faculty? No, of course not. It is just one strange thing that connects in a random way.

The same with the Birthday Paradox, it is just surprising the first time you see it.

It seems surprising for people that you only need 23 people to have 50% chance of a pair with the same birthday, but it is not a paradox for people that work with numbers.

From Zero to Creating Photo Mosaic using Faces with OpenCV

What will we cover in this tutorial?

  1. Where and how to get images you can use without copyright issues.
  2. How to extract the faces of the images.
  3. Building a Photo Mosaic using the extracted images of faces.

Step 1: Where and how to get images

There exists a lot of datasets of faces, but most have restrictions on them. A great place to find images is on Pexels, as they are free to use (see license here).

Also, the Python library pexels-api makes it easy to download a lot of images. It can be installed by the following command.

pip install pexels-api

To use the Pexels API you need to register.

  1. Sign up as a user at Pexels.
  2. Accept the email sent to your inbox (the email address you provide).
  3. Request your API key here.

Then you can download images by a search query from this Python program.

from pexels_api import API
import requests
import os.path
from pathlib import Path


path = 'pics'
Path(path).mkdir(parents=True, exist_ok=True)

# To get key: sign up for pexels https://www.pexels.com/join/
# Reguest key : https://www.pexels.com/api/
# - No need to set URL
# - Accept email send to you
# - Refresh API or see key here: https://www.pexels.com/api/new/

PEXELS_API_KEY = '--- INSERT YOUR API KEY HERE ---'

api = API(PEXELS_API_KEY)

query = 'person'

api.search(query)
# Get photo entries
photos = api.get_entries()
print("Search: ", query)
print("Total results: ", api.total_results)
MAX_PICS = 1000
print("Fetching max: ", MAX_PICS)

count = 0
while True:
    photos = api.get_entries()
    print(len(photos))
    if len(photos) == 0:
        break
    for photo in photos:
        # Print photographer
        print('Photographer: ', photo.photographer)
        # Print original size url
        print('Photo original size: ', photo.original)

        file = os.path.join(path, query + '-' + str(count).zfill(5) + '.' + photo.original.split('.')[-1])
        count += 1
        print(file)
        picture_request = requests.get(photo.original)
        if picture_request.status_code == 200:
            with open(file, 'wb') as f:
                f.write(picture_request.content)

        # This should be a function call to make a return
        if count >= MAX_PICS:
            break

    if count >= MAX_PICS:
        break

    if not api.has_next_page:
        print("Last page: ", api.page)
        break
        # Search next page
    api.search_next_page()

There is an upper limit of 1.000 photos in the above Python program, you can change that if you like. It is set to download photos that are shown if you query person. Feel free to change that.

It takes some time to download all the images and will take up some space.

Step 2: Extract the faces from the photos

Here OpenCV comes in. They have a trained model using the Haar Cascade Classifier. You need to install the OpenCV library by the following command.

pip install opencv-python

The trained model we use is part of the library, but is not loaded easily from the destination. Therefore we suggest you download it from here (it should be named: haarcascade_frontalface_default.xml) and add the it to the location you work from.

We want to use it to identify faces and extract them and save them in a library for later use.

import cv2
import numpy as np
import glob
import os
from pathlib import Path


def preprocess(box_width=12, box_height=16):
    path = "pics"
    output = "small-faces"
    Path(output).mkdir(parents=True, exist_ok=True)
    files = glob.glob(os.path.join(path, "*"))
    files.sort()

    face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")

    images = []
    cnt = 0
    for filename in files:
        print("Processing...", filename)
        frame = cv2.imread(filename)
        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frame_gray = cv2.equalizeHist(frame_gray)
        faces = face_cascade.detectMultiScale(frame_gray, scaleFactor=1.3, minNeighbors=10, minSize=(350, 350), flags=cv2.CASCADE_SCALE_IMAGE)
        for (x, y, w, h) in faces:
            roi = frame[y:y+h, x:x+w]

            img = cv2.resize(roi, (box_width, box_height))
            images.append(img)

            output_file_name = "face-" + str(cnt).zfill(5) + ".jpg"
            output_file_name = os.path.join(output, output_file_name)
            cv2.imwrite(output_file_name, img)

    return np.stack(images)


preprocess(box_width=12, box_height=16)

It will create a folder called small-faces with small images of the identified faces.

Notice, that the Haar Cascade Classifier is not perfect. It will miss a lot of faces and have false positives. It is a good idea to look manually though all the images and delete all false positives (images that are not having a face).

Step 3: Building our first mosaic photo

The approach to divide the photo into equal sized boxes. For each box to find the image (our faces), which fits the best as a replacement.

To improve performance of the process function we use Numba, which is a just-in-time compiler that is designed to optimize NumPy code in for-loops.

import cv2
import numpy as np
import glob
import os
from numba import jit


@jit(nopython=True)
def process(photo, images, box_width=24, box_height=32):
    height, width, _ = photo.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = photo[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            photo[i:i + box_height, j:j + box_width] = images[best_match_index]
    return photo


def main():
    photo = cv2.imread("rune.jpg")

    box_width = 12
    box_height = 16
    height, width, _ = photo.shape
    # To make sure that it we can slice the photo in box-sizes
    width = (width//box_width) * box_width
    height = (height//box_height) * box_height
    photo = cv2.resize(photo, (width, height))

    # Load all the images of the faces
    images = load_images(box_width, box_height)

    # Create the mosaic
    mosaic = process(photo.copy(), images, box_width, box_height)

    cv2.imshow("Original", photo)
    cv2.imshow("Result", mosaic)
    cv2.waitKey(0)


main()

To test it we have used the photo of Rune.

This reuses the same images. This gives a decent result, but if you want to avoid the extreme patterns of reused images, you can change the code for that.

The above example has 606 small images. If you avoid reuse it runs out fast of possible images. This would require a bigger base or the result becomes questionable.

No reuse of face images to create the Photo Mosaic

The above photo mosaic is created on a downscaled size, but still it does not create a good result, if you do not reuse images. This would require a quite larger set of images to work from.

Video Mosaic on Live Webcam Stream with OpenCV and Numba

What will we cover in this tutorial?

We will investigate if we can create a decent video mosaic effect on a live webcam stream using OpenCV, Numba and Python. First we will learn the simple way to create a video mosaic and investigate the performance of that. Then we will extend that to create a better quality video mosaic and try to improve the performance by lowering the quality.

Step 1: How does simple photo mosaic work?

A photographic mosaic is a photo generated by other small images. A black and white example is given here.

The above is not a perfect example of it as it is generated with speed to get it running smooth from a webcam stream. Also, it is done in gray scale to improve performance.

The idea is to generate the original image (photograph) by mosaic technique by a lot of smaller sampled images. This is done in the above with the original frame of 640×480 pixels and the mosaic is constructed of small images of size 16×12 pixels.

The first thing we want to achieve is to create a simple mosaic. A simple mosaic is when the original image is scaled down and each pixel is then exchanged with one small image with the same average color. This is simple and efficient to do.

On a high level this is the process.

  1. Have a collection C of small images used to create the photographic mosaic
  2. Scale down the photo P you want to create a mosaic of.
  3. For each pixel in photo P find the image I from C that has the closed average color as the pixel. Insert image I to represent that pixel.

This explains the simple way of doing. The next question is, will it be efficient enough to have a live webcam stream processed?

Step 2: Create a collection of small images

To optimize performance we have chosen to make it in gray scale. The first step is to collect images you want to use. This can be any pictures.

We have used photos from Pexels, which are all free for use without copyright.

What we need is to convert them all to gray scale and resize to fit our purpose.

import cv2
import glob
import os
import numpy as np

output = "small-pics-16x12"
path = "pics"
files = glob.glob(os.path.join(path, "*"))
for file_name in files:
    print(file_name)
    img = cv2.imread(file_name)
    img = cv2.resize(img, (16, 12))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    mean = np.mean(img)
    output_file_name = "image-" + str(mean).replace('.', '-') + ".jpg"
    output_file_name = os.path.join(output, output_file_name)
    print(output_file_name)
    cv2.imwrite(output_file_name, img)

The script assumes that we have located the images we want to convert to gray scale and resize are located in the local folder pics. Further, we assume that the output images (the processed images) will be put in an already existing folder small-pics-16×12.

Step 3: Get a live stream from the webcam

On a high level a live stream from a webcam is given in the following diagram.

This process framework is given in the code below.

import cv2
import numpy as np


def process(frame):
    return frame


def main():
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        updated_frame = process(gray)

        # Show the frame in a window
        cv2.imshow('WebCam', updated_frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


main()

The above code is just an empty shell where the function call to process is where the all the processing will be. This code will just generate a window that shows a gray scale image.

Step 4: The simple video mosaic

We need to introduce two main things to create this simple video mosaic.

  1. Loading all the images we need to use (the 16×12 gray scale images).
  2. Fill out the processing of each frame, which replaces each 16×12 box of the frame with the best matching image.

The first step is preprocessing and should be done before we enter the main loop of the webcam capturing. The second part is done in each iteration inside the process function.

import cv2
import numpy as np
import glob
import os


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        mosaic_frame = process(gray, images)

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()



images = preprocess()
main(images)

The preprocessing function reads all the images, converts them to gray scale (to have only 1 channel per pixel), and returns them as a NumPy array to have optimized code.

The process function takes and breaks down the image in blocks of 16×12 pixels, computes the average gray scale, and takes the estimated best match. Notice the average (mean) value is a float, hence, we can have more than 256 gray scale images.

In this example we used 1.885 images to process it.

A result can be seen here.

The result is decent but not good.

Step 5: Testing the performance and improve it by using Numba

While the performance is quite good, let us test it.

We do that by using the time library.

First you need to import the time library.

import time

Then time the actual time the process call uses. New code inserted in the main while loop.

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

This will result in the following output.

Process time 0.02651691436767578 seconds
Process time 0.026834964752197266 seconds
Process time 0.025418996810913086 seconds
Process time 0.02562689781188965 seconds
Process time 0.025369882583618164 seconds
Process time 0.025450944900512695 seconds

Or a few lines from it. About 0.025-0.027 seconds.

Let’s try to use Numba in the equation. Numba is a just-in-time compiler for NumPy code. That means it compiles to python code to a binary for speed. If you are new to Numba we recommend you read this tutorial.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()



images = preprocess()
main(images)

This gives the following performance.

Process time 0.0014820098876953125 seconds
Process time 0.0013887882232666016 seconds
Process time 0.0015859603881835938 seconds
Process time 0.0016350746154785156 seconds
Process time 0.0018379688262939453 seconds
Process time 0.0016241073608398438 seconds

Which is a factor 15-20 speed improvement.

Good enough for live streaming. But the result is still not decent.

Step 6: A more advanced video mosaic approach

The more advanced video mosaic consist of approximating the each replacement box of pixels by the replacement image pixel by pixel.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            roi[:,:] = images[best_match_index]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


images = preprocess()
main(images)

There is one line to notice specifically.

total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))

Which is needed, as we work with unsigned 8 bit integers. What it does is, that it takes the and calculates the difference between each pixel in the region of interest (roi) and the image[k]. This is a very expensive calculation as we will see.

Performance shows the following.

Process time 7.030380010604858 seconds
Process time 7.034134149551392 seconds
Process time 7.105709075927734 seconds
Process time 7.138839960098267 seconds

Over 7 seconds for each frame. The result is what can be expected by using this amount of images, but the performance is too slow to have a flowing smooth live webcam stream.

The result can be seen here.

Step 7: Compromise options

There are various options to compromise for speed and we will not investigate all. Here are some.

  • Use fever images in our collection (use less than 1.885 images). Notice, that using half the images, say 900 images, will only speed up 50%.
  • Bigger image sizes. Scaling up to use 32×24 images. Here we will still need to do a lot of processing per pixel still. Hence, the expected speedup might be less than expected.
  • Make a compromised version of the difference calculation (total_sum). This has great potential, but might have undesired effects.
  • Scale down pixel estimation for fever calculations.

We will try the last two.

First, let’s try to exchange the calculation of total_sum, which is our distance function that measures how close our image is. Say, we use this.

                total_sum = np.sum(np.subtract(roi, images[k]))

This results in overflow if we have a calculation like 1 – 2 = 255, which is undesired. On the other hand. It might happen in expected 50% of the cases, and maybe it will skew the calculation evenly for all images.

Let’s try.

Process time 1.857623815536499 seconds
Process time 1.7193729877471924 seconds
Process time 1.7445549964904785 seconds
Process time 1.707035779953003 seconds
Process time 1.6778359413146973 seconds

Wow. That is a speedup of a factor 4-6 per frame. The quality is still fine, but you will notice a poorly mapped image from time to time. But the result is close to the advanced video mosaic and far from the first simple video mosaic.

Another addition we could make is to estimate each box by only 4 pixels. This should still be better than the simple video mosaic approach. I have given the full code below.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


def preprocess2(images, scale_width=8, scale_height=6):
    scaled = []
    _, height, width = images.shape
    print("Dimensions", width, height)
    width //= scale_width
    height //= scale_height
    print("Scaled Dimensions", width, height)
    for i in range(images.shape[0]):
        scaled.append(cv2.resize(images[i], (width, height)))
    return np.stack(scaled)


@jit(nopython=True)
def process3(frame, frame_scaled, images, scaled, box_height=12, box_width=16, scale_width=8, scale_height=6):
    height, width = frame.shape
    width //= scale_width
    height //= scale_height
    box_width //= scale_width
    box_height //= scale_height
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame_scaled[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, scaled.shape[0]):
                total_sum = np.sum(roi - scaled[k])
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            frame[i*scale_height:(i + box_height)*scale_height, j*scale_width:(j + box_width)*scale_width] = images[best_match_index]
    return frame


def main(images, scaled):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        gray_scaled = cv2.resize(gray, (640//8, 480//6))
        mosaic_frame = process3(gray, gray_scaled, images, scaled)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


images = preprocess()
scaled = preprocess2(images)
main(images, scaled)

Where there is added preprocessing step (preprocess2). The process time is now.

Process time 0.5559628009796143 seconds
Process time 0.5979928970336914 seconds
Process time 0.5543379783630371 seconds
Process time 0.5621011257171631 seconds

Which is okay, but still less than 2 frames per seconds.

The result can be seen here.

It is not all bad. It is still better than the simple video mosaic approach.

The result is not perfect. If you want to use it on a live webcam stream with 25-30 frames per seconds, you need to find further optimizations of live with the simple mosaic video approach.

Using Numba for Efficient Frame Modifications in OpenCV

What will we cover in this tutorial?

We will compare the speed for using Numba optimization when making calculations and modifications on frames from a video stream using OpenCV.

In this tutorial we will divide each frame into same size boxes and calculate the average color for each box. Then make a frame which colors each box to that color.

See the effect down in the video. These calculations are expensive in Python, hence we will compare the performance by using Numba.

Step 1: Understand the process requirements

Each video frame from OpenCV is an image represented by a NumPy array. In this example we will use the webcam to capture a video stream and do the calculations and modifications live on the stream. This sets high requirements to the processing time of each frame.

To keep a fluid motion picture we need to show each frame in 1/25 of a second. That leaves at most 0.04 seconds for each frame, from capture, process, and update the window with the video stream.

While the capture and updating the window takes time, it leaves is a great uncertainty how fast the frame processing (calculations and modifications) should be, but a upper bound is 0.04 seconds per frame.

Step 2: The calculations and modifications on each frame

Let’s have some fun. The calculations and modification we want to apply to each frame are as follows.

  • Calculations. We divide each frame into small 6×16 pixels areas and calculate the average color for each area. To get the average color we calculate the average of each channel (BGR).
  • Modification. For each area we will change the color for each area and fill it entirely with the average color.

This can be done by adding this function to process each frame.

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

The frame will be divided into areas of the box size (box_height x box_width). For each box (roi: Region of Interest) the average (mean) value of each of the 3 color channels (b_mean, g_mean, r_mean) and overwriting the area to the average color.

Step 3: Testing performance for this frame process

To get an estimate of the time spend in function process, the cProfile library is quite good. It gives a profiling of time spent in each function call. This is great, since we can get an measure of how much time is spent in the function process.

We can accomplish that by running this code.

import cv2
import numpy as np
import cProfile


def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame


def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))

        frame = process(frame)

        # Show the frame in a window
        cv2.imshow('WebCam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

cProfile.run("main()")

Where the interesting output line is given here.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    7.716    0.026   50.184    0.167 TEST2.py:8(process)

Which says we use 0.026 seconds per call in the process function. This is good, if we the overhead from the other functions in the main loop is less accumulated to 0.014 seconds.

If we investigate further the calls.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    5.132    0.017    5.132    0.017 {method 'read' of 'cv2.VideoCapture' objects}
      300    0.073    0.000    0.073    0.000 {resize}
      300    2.848    0.009    2.848    0.009 {waitKey}
      300    0.120    0.000    0.120    0.000 {flip}
      300    0.724    0.002    0.724    0.002 {imshow}

Which gives an overhead of approximately 0.028 seconds (0.017 + 0.009 + 0.002) from read, resize, flip, imshow and waitKey calls in each iteration. This adds up to a total of 0.054 seconds per frame or a frame rate of 18.5 frames per seconds (FPS).

This is too slow to make it running smooth.

Please notice that cProfile does add some overhead to measure the time.

Step 4: Introducing the Numba to optimize performance

The Numba library is designed to just-in-time compiling code to make NumPy loops faster. Wow. That is just what we need here. Let’s just jump right into it and see how it will do.

import cv2
import numpy as np
from numba import jit
import cProfile


@jit(nopython=True)
def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame


def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))

        frame = process(frame)

        # Show the frame in a window
        cv2.imshow('WebCam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

main(iterations=1)
cProfile.run("main(iterations=300)")

Notice that we call the main loop with one iteration. This is done to call the process function once before we measure the performance as it will compile the code in the first call and keep it compiled.

The result is as follows.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    1.187    0.004    1.187    0.004 TEST2.py:7(pixels)

Which estimates a 0.004 seconds per call. This results in a total time of 0.032 seconds per iteration (0.028 + 0.004). This is sufficient to keep the performance of more than 24 frames-per-second (FPS).

Also, this improves the performance by a factor 6.5 times (7.717 / 1.187).

Conclusion

We got the desired speedup to have a live stream from the webcam and process it frame by frame by using Numba. The speedup was approximately 6.5 times.

Average vs Weighted Average Effect in Video using OpenCV

What will we cover in this tutorial?

Compare the difference of using weighted average and normal average over the last frames streaming from your webcam using OpenCV in Python.

The effect can be seen in the video below and code used to create that is provided below.

Example output Normal Average vs Weighted Average vs One Frame

The code

The code is straight forward and not optimized. The average is calculated by using a deque from the collection library from Python to create a circular buffer.

The two classes of AverageBuffer and WeightedAverageBuffer share the same code for the constructor and apply, but have each their implementation of get_frame which calculates the average and weighted average, respectively.

Please notice, that the code is not written for efficiency and the AverageBuffer has some easy wins in performance if calculated more efficiently.

An important point to see here, is that the frames are saved as float32 in the buffers. This is necessary when we do the actual calculations on the frames later, where we multiply them by a factor, say 4.

Example. The frames are uint8, which are integers 0 to 255. Say we multiply the frame by 4, and the value is 128. This will give 128*4 = 512, which as an uint8 is 0. Hence, we get an undesirable effect. Therefore we convert them to float32 to avoid this.

import cv2
import numpy as np
from collections import deque


class AverageBuffer:
    def __init__(self, maxlen):
        self.buffer = deque(maxlen=maxlen)
        self.shape = None

    def apply(self, frame):
        self.shape = frame.shape
        self.buffer.append(frame)

    def get_frame(self):
        mean_frame = np.zeros(self.shape, dtype='float32')
        for item in self.buffer:
            mean_frame += item
        mean_frame /= len(self.buffer)
        return mean_frame.astype('uint8')


class WeightedAverageBuffer(AverageBuffer):
    def get_frame(self):
        mean_frame = np.zeros(self.shape, dtype='float32')
        i = 0
        for item in self.buffer:
            i += 4
            mean_frame += item*i
        mean_frame /= (i*(i + 1))/8.0
        return mean_frame.astype('uint8')

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)

average_buffer = AverageBuffer(30)
weighted_buffer = WeightedAverageBuffer(30)

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)
    frame = cv2.resize(frame, (320, 240))

    frame_f32 = frame.astype('float32')
    average_buffer.apply(frame_f32)
    weighted_buffer.apply(frame_f32)

    cv2.imshow('WebCam', frame)
    cv2.imshow("Average", average_buffer.get_frame())
    cv2.imshow("Weighted average", weighted_buffer.get_frame())

    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Create a Line Drawing from Webcam Stream using OpenCV in Python

What will we cover in this tutorial?

How to convert a webcam stream into a black and white line drawing using OpenCV and Python. Also, how to adjust the parameters while running the live stream.

See result here.

The things you need to use

There are two things you need to use in order to get a good line drawing of your image.

  1. GaussianBlur to smooth out the image, as detecting lines is sensitive to noise.
  2. Canny that detects the lines.

The Gaussian blur is advised to use a 5×5 filter. The Canny then has to threshold parameters. To find the optimal values for your setting, we have inserted two trackbars where you can set them to any value as see the results.

You can read more about Canny Edge Detection here.

If you need to install OpenCV please read this tutorial.

The code is given below.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)


def nothing(x):
    pass


canny = "Canny"
cv2.namedWindow(canny)
cv2.createTrackbar('Threshold 1', canny, 0, 255, nothing)
cv2.createTrackbar('Threshold 2', canny, 0, 255, nothing)

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    t1 = cv2.getTrackbarPos('Threshold 1', canny)
    t2 = cv2.getTrackbarPos('Threshold 2', canny)
    gb = cv2.GaussianBlur(frame, (5, 5), 0)
    can = cv2.Canny(gb, t1, t2)

    cv2.imshow(canny, can)

    frame[np.where(can)] = 255
    cv2.imshow('WebCam', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Twitter-API + Python: Mapping all Your Followers Locations on a Choropleth Map

What will we cover in this tutorial?

How to find all the locations of your followers on Twitter and create a choropleth map (maps where the color of each shape is based on the value of an associated variable) with all countries. This will all be done by using Python.

This is done in connection with my interest of where the followers are from on my Twitter account. Today my result looks like this.

The Choropleth map of the followers of PythonWithRune on Twitter

Step 1: How to get the followers from your Twitter account

If you are new to Twitter API you will need to create a developer account to get your secret key. You can follow this tutorial to create you developer account and get the needed tokens.

When that is done, you can use the tweepy library to connect to the Twitter API. The library function api.followers_ids(api.me().id) will give you a list of all your followers by user-id.

import tweepy

# Used to connect to the Twitter API
def get_twitter_api():
    # You need your own keys/secret/tokens here
    consumer_key = "--- INSERT YOUR KEY HERE ---"
    consumer_secret = "--- INSERT YOUR SECRET HERE ---"
    access_token = "--- INSERT YOUR TOKEN HERE ---"
    access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    return api


# This function is used to process it all
def process():
    # Connecting to the twitter api
    api = get_twitter_api()

    # Get the list of all your followers - it only gives user-id's
    # - we need to gather all user data after
    followers = api.followers_ids(api.me().id)
    print("Followers", len(followers))


if __name__ == "__main__":
    process()

Which will print out the number of followers you have on your account.

Step 2: Get the location of your followers

How do we transform the twitter user-ids to a location?

We need to look them all up. Luckily, not one-by-one. We can do it in chunks of 100 users per call.

The function api.lookup_users(…) can lookup 100 users per call with users-ids or user-names.

import tweepy

# Used to connect to the Twitter API
def get_twitter_api():
    # You need your own keys/secret/tokens here
    consumer_key = "--- INSERT YOUR KEY HERE ---"
    consumer_secret = "--- INSERT YOUR SECRET HERE ---"
    access_token = "--- INSERT YOUR TOKEN HERE ---"
    access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    return api


# This function is used to process it all
def process():
    # Connecting to the twitter api
    api = get_twitter_api()

    # Get the list of all your followers - it only gives user-id's
    # - we need to gather all user data after
    followers = api.followers_ids(api.me().id)
    print("Followers", len(followers))

    # We need to chunk it up in sizes of 100 (max for api.lookup_users)
    followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
    # Process each chunk - we can call for 100 users per call
    for follower_chunk in followers_chunks:
        # Get a list of users (with location data)
        users = api.lookup_users(user_ids=follower_chunk)
        # Process each user to get location
        for user in users:
            # Print user location
            print(user.location)


if __name__ == "__main__":
    process()

Before you execute this code, you should now it will print all the locations that all your followers have set.

Step 3: Map all user locations to the same format

When users write their locations, it is done in various ways. As this example shows.

India
Kenya
Temecula, CA
Atlanta, GA
Florida, United States
Hyderabad, India
Atlanta, GA
Agadir / Khouribga, Morocco
Miami, FL
Republic of the Philippines
Tampa, FL
Sammamish, WA
Coffee-machine

And as the last example shows, it might not be a real location. Hence, we need to see if we can find the location by asking a service. For this purpose, we will use the GeoPy library, which is a client for several popular geocoding web services.

Hence, for each of the user specified locations (as the examples above) we will call GeoPy and use the result from it as the location. This will bring everything in the same format or clarify if the location exists.

import tweepy
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim


# Used to connect to the Twitter API
def get_twitter_api():
    # You need your own keys/secret/tokens here
    consumer_key = "--- INSERT YOUR KEY HERE ---"
    consumer_secret = "--- INSERT YOUR SECRET HERE ---"
    access_token = "--- INSERT YOUR TOKEN HERE ---"
    access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    return api


# Used to map the twitter user location description to a standard format
def lookup_location(location):
    geo_locator = Nominatim(user_agent="LearnPython")
    try:
        location = geo_locator.geocode(location, language='en')
    except GeocoderTimedOut:
        return None
    return location


# This function is used to process it all
def process():
    # Connecting to the twitter api
    api = get_twitter_api()

    # Get the list of all your followers - it only gives user-id's
    # - we need to gather all user data after
    followers = api.followers_ids(api.me().id)
    print("Followers", len(followers))

    # Used to store all the locations from users
    locations = {}

    # We need to chunk it up in sizes of 100 (max for api.lookup_users)
    followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
    # Process each chunk - we can call for 100 users per call
    for follower_chunk in followers_chunks:
        # Get a list of users (with location data)
        users = api.lookup_users(user_ids=follower_chunk)
        # Process each user to get location
        for user in users:
            # Call used to transform users description of location to same format
            location = lookup_location(user.location)
            # Add it to our counter
            if location:
                location = location.address
                location = location.split(',')[-1].strip()
            if location in locations:
                locations[location] += 1
            else:
                locations[location] = 1


if __name__ == "__main__":
    process()

As you see, it will count the occurrences of each location found. The split and strip is used to get the country and leave out the rest of the address if any.

Step 4: Reformat the locations into a Pandas DataFrame

We want to reformat the locations into a DataFrame to be able to join (merge) it with GeoPandas, which contains the choropleth map we want to use.

To convert the locations into a DataFrame we need to restructure it. This will also helps us to remove duplicates. As an example, United States and United States of America both appear. To handle that we will map all country names to a 3 letter code. We will use the pycountry library for that.

import tweepy
import pycountry
import pandas as pd
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim


# Used to connect to the Twitter API
def get_twitter_api():
    # You need your own keys/secret/tokens here
    consumer_key = "--- INSERT YOUR KEY HERE ---"
    consumer_secret = "--- INSERT YOUR SECRET HERE ---"
    access_token = "--- INSERT YOUR TOKEN HERE ---"
    access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    return api


# Helper function to map country names to alpha_3 representation
# Some are not supported - and are hard-coded in
# Function used to map country names from GeoPandas and the country names from geo_locator
def lookup_country_code(country):
    try:
        alpha_3 = pycountry.countries.lookup(country).alpha_3
        return alpha_3
    except LookupError:
        if country == 'The Netherlands':
            country = 'NLD'
        elif country == 'Democratic Republic of the Congo':
            country = 'COG'
        return country


# Used to map the twitter user location description to a standard format
def lookup_location(location):
    geo_locator = Nominatim(user_agent="LearnPython")
    try:
        location = geo_locator.geocode(location, language='en')
    except GeocoderTimedOut:
        return None
    return location


# This function is used to process it all
def process():
    # Connecting to the twitter api
    api = get_twitter_api()

    # Get the list of all your followers - it only gives user-id's
    # - we need to gather all user data after
    followers = api.followers_ids(api.me().id)
    print("Followers", len(followers))

    # Used to store all the locations from users
    locations = {}

    # We need to chunk it up in sizes of 100 (max for api.lookup_users)
    followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
    # Process each chunk - we can call for 100 users per call
    for follower_chunk in followers_chunks:
        # Get a list of users (with location data)
        users = api.lookup_users(user_ids=follower_chunk)
        # Process each user to get location
        for user in users:
            # Call used to transform users description of location to same format
            location = lookup_location(user.location)
            # Add it to our counter
            if location:
                location = location.address
                location = location.split(',')[-1].strip()
            if location in locations:
                locations[location] += 1
            else:
                locations[location] = 1

    # We reformat the output fo locations
    # Done for two reasons
    # - 1) Some locations have two entries (e.g., United States and United States of America)
    # - 2) To map them into a simple format to join it with GeoPandas
    reformat = {'alpha_3': [], 'followers': []}
    for location in locations:
        print(location, locations[location])
        loc = lookup_country_code(location)
        if loc in reformat['alpha_3']:
            index = reformat['alpha_3'].index(loc)
            reformat['followers'][index] += locations[location]
        else:
            reformat['alpha_3'].append(loc)
            reformat['followers'].append(locations[location])

    # Convert the reformat into a dictionary to join (merge) with GeoPandas
    followers = pd.DataFrame.from_dict(reformat)
    pd.set_option('display.max_columns', 50)
    pd.set_option('display.width', 1000)
    pd.set_option('display.max_rows', 300)
    print(followers.sort_values(by=['followers'], ascending=False))

if __name__ == "__main__":
    process()

That makes it ready to join (merge) with GeoPandas.

Step 5: Merge it with GeoPandas and show the choropleth map

Now for the fun part. We only need to load the geo data from GeoPandas and merge our newly created DataFrame with it. Finally, plot and show it using matplotlib.pyplot.

import tweepy
import pycountry
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim


# Used to connect to the Twitter API
def get_twitter_api():
    # You need your own keys/secret/tokens here
    consumer_key = "--- INSERT YOUR KEY HERE ---"
    consumer_secret = "--- INSERT YOUR SECRET HERE ---"
    access_token = "--- INSERT YOUR TOKEN HERE ---"
    access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    return api


# Helper function to map country names to alpha_3 representation
# Some are not supported - and are hard-coded in
# Function used to map country names from GeoPandas and the country names from geo_locator
def lookup_country_code(country):
    try:
        alpha_3 = pycountry.countries.lookup(country).alpha_3
        return alpha_3
    except LookupError:
        if country == 'The Netherlands':
            country = 'NLD'
        elif country == 'Democratic Republic of the Congo':
            country = 'COG'
        return country


# Used to map the twitter user location description to a standard format
def lookup_location(location):
    geo_locator = Nominatim(user_agent="LearnPython")
    try:
        location = geo_locator.geocode(location, language='en')
    except GeocoderTimedOut:
        return None
    return location


# This function is used to process it all
def process():
    # Connecting to the twitter api
    api = get_twitter_api()

    # Get the list of all your followers - it only gives user-id's
    # - we need to gather all user data after
    followers = api.followers_ids(api.me().id)
    print("Followers", len(followers))

    # Used to store all the locations from users
    locations = {}

    # We need to chunk it up in sizes of 100 (max for api.lookup_users)
    followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
    # Process each chunk - we can call for 100 users per call
    for follower_chunk in followers_chunks:
        # Get a list of users (with location data)
        users = api.lookup_users(user_ids=follower_chunk)
        # Process each user to get location
        for user in users:
            # Call used to transform users description of location to same format
            location = lookup_location(user.location)
            # Add it to our counter
            if location:
                location = location.address
                location = location.split(',')[-1].strip()
            if location in locations:
                locations[location] += 1
            else:
                locations[location] = 1

    # We reformat the output fo locations
    # Done for two reasons
    # - 1) Some locations have two entries (e.g., United States and United States of America)
    # - 2) To map them into a simple format to join it with GeoPandas
    reformat = {'alpha_3': [], 'followers': []}
    for location in locations:
        print(location, locations[location])
        loc = lookup_country_code(location)
        if loc in reformat['alpha_3']:
            index = reformat['alpha_3'].index(loc)
            reformat['followers'][index] += locations[location]
        else:
            reformat['alpha_3'].append(loc)
            reformat['followers'].append(locations[location])

    # Convert the reformat into a dictionary to join (merge) with GeoPandas
    followers = pd.DataFrame.from_dict(reformat)
    pd.set_option('display.max_columns', 50)
    pd.set_option('display.width', 1000)
    pd.set_option('display.max_rows', 300)
    print(followers.sort_values(by=['followers'], ascending=False))

    # Read the GeoPandas
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    # Remove the columns not needed
    world = world.drop(['pop_est', 'continent', 'iso_a3', 'gdp_md_est'], axis=1)
    # Map the same naming convention as followers (the above DataFrame)
    # - this step is needed, because the iso_a3 column was missing a few countries 
    world['iso_a3'] = world.apply(lambda row: lookup_country_code(row['name']), axis=1)
    # Merge the tables (DataFrames)
    table = world.merge(followers, how="left", left_on=['iso_a3'], right_on=['alpha_3'])

    # Plot the data in a graph
    table.plot(column='followers', figsize=(8, 6))
    plt.show()


if __name__ == "__main__":
    process()

Resulting in the following output (for PythonWithRune twitter account (not yours)).

OpenCV + Python: Move Objects Around in a Live Webcam Stream Using Your Hands

What will we cover in this tutorial?

How do you detect movements in a webcam stream? Also, how do you insert objects in a live webcam stream? Further, how do you change the position of the object based on the movements?

We will learn all that in this tutorial. The end result can be seen in the video below.

The end result of this tutorial

Step 1: Understand the flow of webcam processing

A webcam stream is processed frame-by-frame.

Illustration: Webcam processing flow

As the above illustration shows, when the webcam captures the next frame, the actual processing often happens on a copy of the original frame. When all the updates and calculations are done, they are inserted in the original frame.

This is interesting. To extract information from the webcam frame we need to work with the frame and find the features we are looking for.

In our example, we need to find movement and based on that see if that movement is touching our object.

A simple flow without any processing would look like this.

import cv2


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

The above code will create a direct stream from your webcam to a window.

Step 2: Insert a logo – do it with a class that we will extend later

Here we want to insert a logo in a fixed position in our webcam stream. This can be achieved be the following code. The main difference is the new object Object defined and created.

The object briefly explained

  • The object will represent the logo we want to insert.
  • It will keep the current position (which is static so far)
  • The logo itself.
  • The mask used to insert it later (when insert_object is called).
  • The constructor (__init__(…)) does the stuff only needed once. Read the logo (it assumes you have a file named logo.png in the same folder), resize it, creating a mask (by gray scaling and thresholding), setting the initial positions of the logo.

Before the while-loop the object obj is created. All that is needed at this stage is to insert the logo in each frame.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This will result in the following output (when you put me in front of the webcam – that said, if you do it, expect that you sit in the picture and not me (just want to avoid any uncomfortable surprises for you when you show up in the window)).

The logo at a fixed position.

For more details on how to insert a logo in a live webcam stream, you can read this tutorial.

Step 3: Detect movement in the frame

Detecting movement is not a simple task. Depending on your needs, it can be solved quite simple. In this tutorial we only need to detect simple movement. That is, if you are in the frame and sit still, we do not care to detect it. We only care to detect the actual movement.

We can solve that problem by using the library function createBackgroundSubtractorMOG2(), which can “remove” the background from your frame. It is far from a perfect solution, but it is sufficient for what we want to achieve.

As we only want to see if there is movement or not, and not how much the difference is from previous detected background, we will use a threshold function to make the image black and white based on that. We set the threshold quite high, as it will also remove noise from the image.

It might happen that in your settings (lightening etc.) you need to adjust that value. See the comments in the code how to do that.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)

    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the foreground mask
    cv2.imshow('fg_mask', fg_mask)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output.

Output – again, don’t expect to see me when you run this example on your computer

As you see, it does a decent job to detect movement. Sometimes it happens that you create a shadow after your movements. Hence, it is not perfect.

Step 4: Detecting movement where the object is and move it accordingly

This is the tricky part. But let’s break it down simple.

  • We need to detect if the mask, we created in previous step, is overlapping with the object (logo).
  • If so, we want to move the object (logo).

That is what we want to achieve.

How do we do that?

  • Detect if there is an overlap by using the same mask we create for the logo and see if it overlaps with any points on the mask of the movement.
  • If so, we move the object by choosing a random movement. Measure how much overlap is. Then choose another random movement. See if the overlap is less.
  • Continue this a few times and chose the random movement with the least overlap.

This turns out to by chance to move away from the overlapping areas. This is the power of introducing some randomness, which simplifies the algorithm a lot.

A more precise approach would be to calculate in which direction the least mask is close to the object (logo). This becomes quite complicated and needs a lot of calculations. Hence, we chose to have this simple approach, which has both a speed element and direction element that works fairly well.

All we need to do, is to add a update_position function to our class and call it before we insert the logo.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y
        self.on_mask = False

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo

    def update_position(self, mask):
        height, width = mask.shape

        # Check if object is overlapping with moving parts
        roi = mask[self.y:self.y + self.size, self.x:self.x + self.size]
        check = np.any(roi[np.where(self.logo_mask)])

        # If object has moving parts, then find new position
        if check:
            # To save the best possible movement
            best_delta_x = 0
            best_delta_y = 0
            best_fit = np.inf
            # Try 8 different positions
            for _ in range(8):
                # Pick a random position
                delta_x = np.random.randint(-15, 15)
                delta_y = np.random.randint(-15, 15)

                # Ensure we are inside the frame, if outside, skip and continue
                if self.y + self.size + delta_y > height or self.y + delta_y < 0 or \
                        self.x + self.size + delta_x > width or self.x + delta_x < 0:
                    continue

                # Calculate how much overlap
                roi = mask[self.y + delta_y:self.y + delta_y + self.size, self.x + delta_x:self.x + delta_x + self.size]
                check = np.count_nonzero(roi[np.where(self.logo_mask)])
                # If perfect fit (no overlap), just return
                if check == 0:
                    self.x += delta_x
                    self.y += delta_y
                    return
                # If a better fit found, save it
                elif check < best_fit:
                    best_fit = check
                    best_delta_x = delta_x
                    best_delta_y = delta_y

            # After for-loop, update to best fit (if any found)
            if best_fit < np.inf:
                self.x += best_delta_x
                self.y += best_delta_y
                return


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)

    # Find a new position for object (logo)
    # - fg_mask contains all moving parts
    # - updated position will be the one with least moving parts
    obj.update_position(fg_mask)
    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the fg_mask uncomment the line below
    # cv2.imshow('fg_mask', fg_mask)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Test it

Well, this is the fun part. See a live demo in the video below.

The final result

What is next step?

I would be happy to hear any suggestions from you. I see a lot of potential improvements, but the conceptual idea is explained and showed in this tutorial.

Create Cartoon Characters in Live Webcam Stream with OpenCV and Python

What will we cover in this tutorial?

How to convert the foreground characters of a live webcam feed to become cartoons, while keeping the background as it is.

In this tutorial we will show how this can be done using OpenCV and Python in a few lines of code. The result can be seen in the YouTube video below.

Step 1: Find the moving parts

The big challenge is to identify what is the background and what is the foreground.

This can be done in various ways, but we want to keep it quite accurate and not just identifying boxes around moving objects. We actually want to have the contour of the objects and fill them all out.

While this sounds easy, it is a bit challenging. Still, we will try to do it as simple as possible.

First step is to keep the last frame and subtract it from the current frame. This will give all the moving parts. This should be done on a gray scale image.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    cv2.imshow('WebCam (Mask)', abs_diff)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output with a gray scale contour of the moving part of the image. If you need help installing OpenCV read this tutorial.

Step 2: Using a threshold

To make the contour more visible you can use a threshold (cv2.threshold(…)).

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
 
    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in this output

Using the threshold makes the image black and white. This helps it to become easier to detect the moving parts.

Step 3: Fill out the enclosed contours

To fill out the enclosed contours you can use morphologyEx. Also, we have used dilate to make the lines more thick and enclose the part better.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)

    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in the following output.

Me happy, next to a white shadows ghost of myself

Step 4: Creating cartoon effect and mask it into the foreground

The final step is to create a cartoon version of the frame (cv2.stylization()).

    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)

And mask it out out with the foreground mask. This will result in the following code.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)

    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)
    idx = (mask > 1)
    frame[idx] = frame_effect[idx]

    # cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Try it in real life

I must say the cartoon effect is heavy (is slow). But other than that, it works fine.