From Zero to Creating Photo Mosaic using Faces with OpenCV

What will we cover in this tutorial?

  1. Where and how to get images you can use without copyright issues.
  2. How to extract the faces of the images.
  3. Building a Photo Mosaic using the extracted images of faces.

Step 1: Where and how to get images

There exists a lot of datasets of faces, but most have restrictions on them. A great place to find images is on Pexels, as they are free to use (see license here).

Also, the Python library pexels-api makes it easy to download a lot of images. It can be installed by the following command.

pip install pexels-api

To use the Pexels API you need to register.

  1. Sign up as a user at Pexels.
  2. Accept the email sent to your inbox (the email address you provide).
  3. Request your API key here.

Then you can download images by a search query from this Python program.

from pexels_api import API
import requests
import os.path
from pathlib import Path


path = 'pics'
Path(path).mkdir(parents=True, exist_ok=True)

# To get key: sign up for pexels https://www.pexels.com/join/
# Reguest key : https://www.pexels.com/api/
# - No need to set URL
# - Accept email send to you
# - Refresh API or see key here: https://www.pexels.com/api/new/

PEXELS_API_KEY = '--- INSERT YOUR API KEY HERE ---'

api = API(PEXELS_API_KEY)

query = 'person'

api.search(query)
# Get photo entries
photos = api.get_entries()
print("Search: ", query)
print("Total results: ", api.total_results)
MAX_PICS = 1000
print("Fetching max: ", MAX_PICS)

count = 0
while True:
    photos = api.get_entries()
    print(len(photos))
    if len(photos) == 0:
        break
    for photo in photos:
        # Print photographer
        print('Photographer: ', photo.photographer)
        # Print original size url
        print('Photo original size: ', photo.original)

        file = os.path.join(path, query + '-' + str(count).zfill(5) + '.' + photo.original.split('.')[-1])
        count += 1
        print(file)
        picture_request = requests.get(photo.original)
        if picture_request.status_code == 200:
            with open(file, 'wb') as f:
                f.write(picture_request.content)

        # This should be a function call to make a return
        if count >= MAX_PICS:
            break

    if count >= MAX_PICS:
        break

    if not api.has_next_page:
        print("Last page: ", api.page)
        break
        # Search next page
    api.search_next_page()

There is an upper limit of 1.000 photos in the above Python program, you can change that if you like. It is set to download photos that are shown if you query person. Feel free to change that.

It takes some time to download all the images and will take up some space.

Step 2: Extract the faces from the photos

Here OpenCV comes in. They have a trained model using the Haar Cascade Classifier. You need to install the OpenCV library by the following command.

pip install opencv-python

The trained model we use is part of the library, but is not loaded easily from the destination. Therefore we suggest you download it from here (it should be named: haarcascade_frontalface_default.xml) and add the it to the location you work from.

We want to use it to identify faces and extract them and save them in a library for later use.

import cv2
import numpy as np
import glob
import os
from pathlib import Path


def preprocess(box_width=12, box_height=16):
    path = "pics"
    output = "small-faces"
    Path(output).mkdir(parents=True, exist_ok=True)
    files = glob.glob(os.path.join(path, "*"))
    files.sort()

    face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")

    images = []
    cnt = 0
    for filename in files:
        print("Processing...", filename)
        frame = cv2.imread(filename)
        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frame_gray = cv2.equalizeHist(frame_gray)
        faces = face_cascade.detectMultiScale(frame_gray, scaleFactor=1.3, minNeighbors=10, minSize=(350, 350), flags=cv2.CASCADE_SCALE_IMAGE)
        for (x, y, w, h) in faces:
            roi = frame[y:y+h, x:x+w]

            img = cv2.resize(roi, (box_width, box_height))
            images.append(img)

            output_file_name = "face-" + str(cnt).zfill(5) + ".jpg"
            output_file_name = os.path.join(output, output_file_name)
            cv2.imwrite(output_file_name, img)

    return np.stack(images)


preprocess(box_width=12, box_height=16)

It will create a folder called small-faces with small images of the identified faces.

Notice, that the Haar Cascade Classifier is not perfect. It will miss a lot of faces and have false positives. It is a good idea to look manually though all the images and delete all false positives (images that are not having a face).

Step 3: Building our first mosaic photo

The approach to divide the photo into equal sized boxes. For each box to find the image (our faces), which fits the best as a replacement.

To improve performance of the process function we use Numba, which is a just-in-time compiler that is designed to optimize NumPy code in for-loops.

import cv2
import numpy as np
import glob
import os
from numba import jit


@jit(nopython=True)
def process(photo, images, box_width=24, box_height=32):
    height, width, _ = photo.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = photo[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            photo[i:i + box_height, j:j + box_width] = images[best_match_index]
    return photo


def main():
    photo = cv2.imread("rune.jpg")

    box_width = 12
    box_height = 16
    height, width, _ = photo.shape
    # To make sure that it we can slice the photo in box-sizes
    width = (width//box_width) * box_width
    height = (height//box_height) * box_height
    photo = cv2.resize(photo, (width, height))

    # Load all the images of the faces
    images = load_images(box_width, box_height)

    # Create the mosaic
    mosaic = process(photo.copy(), images, box_width, box_height)

    cv2.imshow("Original", photo)
    cv2.imshow("Result", mosaic)
    cv2.waitKey(0)


main()

To test it we have used the photo of Rune.

This reuses the same images. This gives a decent result, but if you want to avoid the extreme patterns of reused images, you can change the code for that.

The above example has 606 small images. If you avoid reuse it runs out fast of possible images. This would require a bigger base or the result becomes questionable.

No reuse of face images to create the Photo Mosaic

The above photo mosaic is created on a downscaled size, but still it does not create a good result, if you do not reuse images. This would require a quite larger set of images to work from.

Video Mosaic on Live Webcam Stream with OpenCV and Numba

What will we cover in this tutorial?

We will investigate if we can create a decent video mosaic effect on a live webcam stream using OpenCV, Numba and Python. First we will learn the simple way to create a video mosaic and investigate the performance of that. Then we will extend that to create a better quality video mosaic and try to improve the performance by lowering the quality.

Step 1: How does simple photo mosaic work?

A photographic mosaic is a photo generated by other small images. A black and white example is given here.

The above is not a perfect example of it as it is generated with speed to get it running smooth from a webcam stream. Also, it is done in gray scale to improve performance.

The idea is to generate the original image (photograph) by mosaic technique by a lot of smaller sampled images. This is done in the above with the original frame of 640×480 pixels and the mosaic is constructed of small images of size 16×12 pixels.

The first thing we want to achieve is to create a simple mosaic. A simple mosaic is when the original image is scaled down and each pixel is then exchanged with one small image with the same average color. This is simple and efficient to do.

On a high level this is the process.

  1. Have a collection C of small images used to create the photographic mosaic
  2. Scale down the photo P you want to create a mosaic of.
  3. For each pixel in photo P find the image I from C that has the closed average color as the pixel. Insert image I to represent that pixel.

This explains the simple way of doing. The next question is, will it be efficient enough to have a live webcam stream processed?

Step 2: Create a collection of small images

To optimize performance we have chosen to make it in gray scale. The first step is to collect images you want to use. This can be any pictures.

We have used photos from Pexels, which are all free for use without copyright.

What we need is to convert them all to gray scale and resize to fit our purpose.

import cv2
import glob
import os
import numpy as np

output = "small-pics-16x12"
path = "pics"
files = glob.glob(os.path.join(path, "*"))
for file_name in files:
    print(file_name)
    img = cv2.imread(file_name)
    img = cv2.resize(img, (16, 12))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    mean = np.mean(img)
    output_file_name = "image-" + str(mean).replace('.', '-') + ".jpg"
    output_file_name = os.path.join(output, output_file_name)
    print(output_file_name)
    cv2.imwrite(output_file_name, img)

The script assumes that we have located the images we want to convert to gray scale and resize are located in the local folder pics. Further, we assume that the output images (the processed images) will be put in an already existing folder small-pics-16×12.

Step 3: Get a live stream from the webcam

On a high level a live stream from a webcam is given in the following diagram.

This process framework is given in the code below.

import cv2
import numpy as np


def process(frame):
    return frame


def main():
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        updated_frame = process(gray)

        # Show the frame in a window
        cv2.imshow('WebCam', updated_frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


main()

The above code is just an empty shell where the function call to process is where the all the processing will be. This code will just generate a window that shows a gray scale image.

Step 4: The simple video mosaic

We need to introduce two main things to create this simple video mosaic.

  1. Loading all the images we need to use (the 16×12 gray scale images).
  2. Fill out the processing of each frame, which replaces each 16×12 box of the frame with the best matching image.

The first step is preprocessing and should be done before we enter the main loop of the webcam capturing. The second part is done in each iteration inside the process function.

import cv2
import numpy as np
import glob
import os


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        mosaic_frame = process(gray, images)

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()



images = preprocess()
main(images)

The preprocessing function reads all the images, converts them to gray scale (to have only 1 channel per pixel), and returns them as a NumPy array to have optimized code.

The process function takes and breaks down the image in blocks of 16×12 pixels, computes the average gray scale, and takes the estimated best match. Notice the average (mean) value is a float, hence, we can have more than 256 gray scale images.

In this example we used 1.885 images to process it.

A result can be seen here.

The result is decent but not good.

Step 5: Testing the performance and improve it by using Numba

While the performance is quite good, let us test it.

We do that by using the time library.

First you need to import the time library.

import time

Then time the actual time the process call uses. New code inserted in the main while loop.

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

This will result in the following output.

Process time 0.02651691436767578 seconds
Process time 0.026834964752197266 seconds
Process time 0.025418996810913086 seconds
Process time 0.02562689781188965 seconds
Process time 0.025369882583618164 seconds
Process time 0.025450944900512695 seconds

Or a few lines from it. About 0.025-0.027 seconds.

Let’s try to use Numba in the equation. Numba is a just-in-time compiler for NumPy code. That means it compiles to python code to a binary for speed. If you are new to Numba we recommend you read this tutorial.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()



images = preprocess()
main(images)

This gives the following performance.

Process time 0.0014820098876953125 seconds
Process time 0.0013887882232666016 seconds
Process time 0.0015859603881835938 seconds
Process time 0.0016350746154785156 seconds
Process time 0.0018379688262939453 seconds
Process time 0.0016241073608398438 seconds

Which is a factor 15-20 speed improvement.

Good enough for live streaming. But the result is still not decent.

Step 6: A more advanced video mosaic approach

The more advanced video mosaic consist of approximating the each replacement box of pixels by the replacement image pixel by pixel.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            roi[:,:] = images[best_match_index]
    return frame


def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


images = preprocess()
main(images)

There is one line to notice specifically.

total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))

Which is needed, as we work with unsigned 8 bit integers. What it does is, that it takes the and calculates the difference between each pixel in the region of interest (roi) and the image[k]. This is a very expensive calculation as we will see.

Performance shows the following.

Process time 7.030380010604858 seconds
Process time 7.034134149551392 seconds
Process time 7.105709075927734 seconds
Process time 7.138839960098267 seconds

Over 7 seconds for each frame. The result is what can be expected by using this amount of images, but the performance is too slow to have a flowing smooth live webcam stream.

The result can be seen here.

Step 7: Compromise options

There are various options to compromise for speed and we will not investigate all. Here are some.

  • Use fever images in our collection (use less than 1.885 images). Notice, that using half the images, say 900 images, will only speed up 50%.
  • Bigger image sizes. Scaling up to use 32×24 images. Here we will still need to do a lot of processing per pixel still. Hence, the expected speedup might be less than expected.
  • Make a compromised version of the difference calculation (total_sum). This has great potential, but might have undesired effects.
  • Scale down pixel estimation for fever calculations.

We will try the last two.

First, let’s try to exchange the calculation of total_sum, which is our distance function that measures how close our image is. Say, we use this.

                total_sum = np.sum(np.subtract(roi, images[k]))

This results in overflow if we have a calculation like 1 – 2 = 255, which is undesired. On the other hand. It might happen in expected 50% of the cases, and maybe it will skew the calculation evenly for all images.

Let’s try.

Process time 1.857623815536499 seconds
Process time 1.7193729877471924 seconds
Process time 1.7445549964904785 seconds
Process time 1.707035779953003 seconds
Process time 1.6778359413146973 seconds

Wow. That is a speedup of a factor 4-6 per frame. The quality is still fine, but you will notice a poorly mapped image from time to time. But the result is close to the advanced video mosaic and far from the first simple video mosaic.

Another addition we could make is to estimate each box by only 4 pixels. This should still be better than the simple video mosaic approach. I have given the full code below.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit


def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)


def preprocess2(images, scale_width=8, scale_height=6):
    scaled = []
    _, height, width = images.shape
    print("Dimensions", width, height)
    width //= scale_width
    height //= scale_height
    print("Scaled Dimensions", width, height)
    for i in range(images.shape[0]):
        scaled.append(cv2.resize(images[i], (width, height)))
    return np.stack(scaled)


@jit(nopython=True)
def process3(frame, frame_scaled, images, scaled, box_height=12, box_width=16, scale_width=8, scale_height=6):
    height, width = frame.shape
    width //= scale_width
    height //= scale_height
    box_width //= scale_width
    box_height //= scale_height
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame_scaled[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, scaled.shape[0]):
                total_sum = np.sum(roi - scaled[k])
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            frame[i*scale_height:(i + box_height)*scale_height, j*scale_width:(j + box_width)*scale_width] = images[best_match_index]
    return frame


def main(images, scaled):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Update the frame
        start = time.time()
        gray_scaled = cv2.resize(gray, (640//8, 480//6))
        mosaic_frame = process3(gray, gray_scaled, images, scaled)
        print("Process time", time.time()- start, "seconds")

        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


images = preprocess()
scaled = preprocess2(images)
main(images, scaled)

Where there is added preprocessing step (preprocess2). The process time is now.

Process time 0.5559628009796143 seconds
Process time 0.5979928970336914 seconds
Process time 0.5543379783630371 seconds
Process time 0.5621011257171631 seconds

Which is okay, but still less than 2 frames per seconds.

The result can be seen here.

It is not all bad. It is still better than the simple video mosaic approach.

The result is not perfect. If you want to use it on a live webcam stream with 25-30 frames per seconds, you need to find further optimizations of live with the simple mosaic video approach.

Create a Line Drawing from Webcam Stream using OpenCV in Python

What will we cover in this tutorial?

How to convert a webcam stream into a black and white line drawing using OpenCV and Python. Also, how to adjust the parameters while running the live stream.

See result here.

The things you need to use

There are two things you need to use in order to get a good line drawing of your image.

  1. GaussianBlur to smooth out the image, as detecting lines is sensitive to noise.
  2. Canny that detects the lines.

The Gaussian blur is advised to use a 5×5 filter. The Canny then has to threshold parameters. To find the optimal values for your setting, we have inserted two trackbars where you can set them to any value as see the results.

You can read more about Canny Edge Detection here.

If you need to install OpenCV please read this tutorial.

The code is given below.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)


def nothing(x):
    pass


canny = "Canny"
cv2.namedWindow(canny)
cv2.createTrackbar('Threshold 1', canny, 0, 255, nothing)
cv2.createTrackbar('Threshold 2', canny, 0, 255, nothing)

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    t1 = cv2.getTrackbarPos('Threshold 1', canny)
    t2 = cv2.getTrackbarPos('Threshold 2', canny)
    gb = cv2.GaussianBlur(frame, (5, 5), 0)
    can = cv2.Canny(gb, t1, t2)

    cv2.imshow(canny, can)

    frame[np.where(can)] = 255
    cv2.imshow('WebCam', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

OpenCV + Python: Move Objects Around in a Live Webcam Stream Using Your Hands

What will we cover in this tutorial?

How do you detect movements in a webcam stream? Also, how do you insert objects in a live webcam stream? Further, how do you change the position of the object based on the movements?

We will learn all that in this tutorial. The end result can be seen in the video below.

The end result of this tutorial

Step 1: Understand the flow of webcam processing

A webcam stream is processed frame-by-frame.

Illustration: Webcam processing flow

As the above illustration shows, when the webcam captures the next frame, the actual processing often happens on a copy of the original frame. When all the updates and calculations are done, they are inserted in the original frame.

This is interesting. To extract information from the webcam frame we need to work with the frame and find the features we are looking for.

In our example, we need to find movement and based on that see if that movement is touching our object.

A simple flow without any processing would look like this.

import cv2


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

The above code will create a direct stream from your webcam to a window.

Step 2: Insert a logo – do it with a class that we will extend later

Here we want to insert a logo in a fixed position in our webcam stream. This can be achieved be the following code. The main difference is the new object Object defined and created.

The object briefly explained

  • The object will represent the logo we want to insert.
  • It will keep the current position (which is static so far)
  • The logo itself.
  • The mask used to insert it later (when insert_object is called).
  • The constructor (__init__(…)) does the stuff only needed once. Read the logo (it assumes you have a file named logo.png in the same folder), resize it, creating a mask (by gray scaling and thresholding), setting the initial positions of the logo.

Before the while-loop the object obj is created. All that is needed at this stage is to insert the logo in each frame.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This will result in the following output (when you put me in front of the webcam – that said, if you do it, expect that you sit in the picture and not me (just want to avoid any uncomfortable surprises for you when you show up in the window)).

The logo at a fixed position.

For more details on how to insert a logo in a live webcam stream, you can read this tutorial.

Step 3: Detect movement in the frame

Detecting movement is not a simple task. Depending on your needs, it can be solved quite simple. In this tutorial we only need to detect simple movement. That is, if you are in the frame and sit still, we do not care to detect it. We only care to detect the actual movement.

We can solve that problem by using the library function createBackgroundSubtractorMOG2(), which can “remove” the background from your frame. It is far from a perfect solution, but it is sufficient for what we want to achieve.

As we only want to see if there is movement or not, and not how much the difference is from previous detected background, we will use a threshold function to make the image black and white based on that. We set the threshold quite high, as it will also remove noise from the image.

It might happen that in your settings (lightening etc.) you need to adjust that value. See the comments in the code how to do that.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)

    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)

    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the foreground mask
    cv2.imshow('fg_mask', fg_mask)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output.

Output – again, don’t expect to see me when you run this example on your computer

As you see, it does a decent job to detect movement. Sometimes it happens that you create a shadow after your movements. Hence, it is not perfect.

Step 4: Detecting movement where the object is and move it accordingly

This is the tricky part. But let’s break it down simple.

  • We need to detect if the mask, we created in previous step, is overlapping with the object (logo).
  • If so, we want to move the object (logo).

That is what we want to achieve.

How do we do that?

  • Detect if there is an overlap by using the same mask we create for the logo and see if it overlaps with any points on the mask of the movement.
  • If so, we move the object by choosing a random movement. Measure how much overlap is. Then choose another random movement. See if the overlap is less.
  • Continue this a few times and chose the random movement with the least overlap.

This turns out to by chance to move away from the overlapping areas. This is the power of introducing some randomness, which simplifies the algorithm a lot.

A more precise approach would be to calculate in which direction the least mask is close to the object (logo). This becomes quite complicated and needs a lot of calculations. Hence, we chose to have this simple approach, which has both a speed element and direction element that works fairly well.

All we need to do, is to add a update_position function to our class and call it before we insert the logo.

import cv2
import numpy as np


# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y
        self.on_mask = False

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo

    def update_position(self, mask):
        height, width = mask.shape

        # Check if object is overlapping with moving parts
        roi = mask[self.y:self.y + self.size, self.x:self.x + self.size]
        check = np.any(roi[np.where(self.logo_mask)])

        # If object has moving parts, then find new position
        if check:
            # To save the best possible movement
            best_delta_x = 0
            best_delta_y = 0
            best_fit = np.inf
            # Try 8 different positions
            for _ in range(8):
                # Pick a random position
                delta_x = np.random.randint(-15, 15)
                delta_y = np.random.randint(-15, 15)

                # Ensure we are inside the frame, if outside, skip and continue
                if self.y + self.size + delta_y > height or self.y + delta_y < 0 or \
                        self.x + self.size + delta_x > width or self.x + delta_x < 0:
                    continue

                # Calculate how much overlap
                roi = mask[self.y + delta_y:self.y + delta_y + self.size, self.x + delta_x:self.x + delta_x + self.size]
                check = np.count_nonzero(roi[np.where(self.logo_mask)])
                # If perfect fit (no overlap), just return
                if check == 0:
                    self.x += delta_x
                    self.y += delta_y
                    return
                # If a better fit found, save it
                elif check < best_fit:
                    best_fit = check
                    best_delta_x = delta_x
                    best_delta_y = delta_y

            # After for-loop, update to best fit (if any found)
            if best_fit < np.inf:
                self.x += best_delta_x
                self.y += best_delta_y
                return


# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()

# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)

    # Find a new position for object (logo)
    # - fg_mask contains all moving parts
    # - updated position will be the one with least moving parts
    obj.update_position(fg_mask)
    # Insert the object into the frame
    obj.insert_object(frame)

    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the fg_mask uncomment the line below
    # cv2.imshow('fg_mask', fg_mask)

    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Test it

Well, this is the fun part. See a live demo in the video below.

The final result

What is next step?

I would be happy to hear any suggestions from you. I see a lot of potential improvements, but the conceptual idea is explained and showed in this tutorial.

Create Cartoon Characters in Live Webcam Stream with OpenCV and Python

What will we cover in this tutorial?

How to convert the foreground characters of a live webcam feed to become cartoons, while keeping the background as it is.

In this tutorial we will show how this can be done using OpenCV and Python in a few lines of code. The result can be seen in the YouTube video below.

Step 1: Find the moving parts

The big challenge is to identify what is the background and what is the foreground.

This can be done in various ways, but we want to keep it quite accurate and not just identifying boxes around moving objects. We actually want to have the contour of the objects and fill them all out.

While this sounds easy, it is a bit challenging. Still, we will try to do it as simple as possible.

First step is to keep the last frame and subtract it from the current frame. This will give all the moving parts. This should be done on a gray scale image.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    cv2.imshow('WebCam (Mask)', abs_diff)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output with a gray scale contour of the moving part of the image. If you need help installing OpenCV read this tutorial.

Step 2: Using a threshold

To make the contour more visible you can use a threshold (cv2.threshold(…)).

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
 
    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in this output

Using the threshold makes the image black and white. This helps it to become easier to detect the moving parts.

Step 3: Fill out the enclosed contours

To fill out the enclosed contours you can use morphologyEx. Also, we have used dilate to make the lines more thick and enclose the part better.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)

    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in the following output.

Me happy, next to a white shadows ghost of myself

Step 4: Creating cartoon effect and mask it into the foreground

The final step is to create a cartoon version of the frame (cv2.stylization()).

    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)

And mask it out out with the foreground mask. This will result in the following code.

import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground

    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)

    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)
    idx = (mask > 1)
    frame[idx] = frame_effect[idx]

    # cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Try it in real life

I must say the cartoon effect is heavy (is slow). But other than that, it works fine.

Create Cartoon Background in Webcam Stream using OpenCV

What will we cover in this tutorial?

How to create this effect.

Create this effect in a few lines of code

The idea behind the code

The idea behind the above effect is simple. We will use a background subtractor, which will get the background of an image and make a mask of the foreground.

The it simple follows this structure.

  1. Capture a frame from the webcam.
  2. Get the foreground mask fg_mask.
  3. To get greater effect dilate the fg_mask.
  4. From the original frame, create a cartoon frame.
  5. Use the zero entries of fg_mask as index to copy the cartoon frame into frame. That is, it overwrites all pixel corresponding to a zero (black) value in fg_mask to the values of the cartoon in the original frame. That results in that we only get cartoon effect in the background and not on the objects.
  6. Show the frame with background cartoon effect.

The code you need to create the above effect

This is all done by using OpenCV. If you need help to install OpenCV I suggest you read this tutorial. Otherwise the code follows the above steps.

import cv2

backSub = cv2.createBackgroundSubtractorKNN(history=200)

cap = cv2.VideoCapture(0)

while True:
    _, frame = cap.read()

    fg_mask = backSub.apply(frame)
    fg_mask = cv2.dilate(fg_mask, None, iterations=2)

    _, cartoon = cv2.pencilSketch(frame, sigma_s=50, sigma_r=0.3, shade_factor=0.02)

    idx = (fg_mask < 1)
    frame[idx] = cartoon[idx]
    cv2.imshow('Frame', frame)
    cv2.imshow('FG Mask', fg_mask)

    keyboard = cv2.waitKey(1)
    if keyboard == ord('q'):
        break

OpenCV + Python + Webcam: Create a Simple Game (Avoid the falling logo)

What will we cover in this tutorial?

Is it a bird? Is it a plain? No, it is falling objects from the sky.

With OpenCV you can get a stream of frames from your webcam. Process the data in Python to create an easy prototype of a simple game, where you should avoid the falling objects. We will cover how to built that in this tutorial.

Step 1: The game explained

The game is quite simple and is built based on a few ideas.

  1. Setup a live stream from your webcam.
  2. Insert falling objects starting from the top of the frame at a random vertical position.
  3. If the object hits you (the player) you get subtracted 1 point in your score.
  4. On the other hand, if you avoid the object and it hits the bottom of the frame without hitting you, you gain one point in your score.
  5. Play until bored or tired.

Step 2: How can this game be built easy?

You basically need three components to create this game.

Firstly, we need a way to take and process each frame from the webcam. That is, create a live stream that can show you what is happening in the view of the frame. This will only require a way to read a frame from the webcam and show it on the screen. If this is done repeatedly, you have a live stream.

Secondly, something that can make objects that fall down in your frame from a random position. That is, it should remember where the object was in the last frame and insert in a new, lower position in the new frame.

Thirdly, something that can detect where you are in the frame. Hence, if the object and you are in the same position. You are subtracted a point from your score and a new object is created at the top.

The great news is that we can make all the simple by using Python.

Step 3: Get a live stream from a webcam

a thing that is required as a prior condition for something else to happen or exist.

Google meaning

A simple stream from the webcam can be created be the following code.

import cv2


# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)


while True:
    # Get a frame
    _, frame = cap.read()

    # Update the frame in the window
    cv2.imshow("Webcam", frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

If you have troubles installing OpenCV, please read this tutorial. You might need to change the width and height of the webcam. You can find all the possible resolutions your webcam support by following this tutorial. The reason to lower the resolution is to increase the processing time and not slow the game.

Another approach, where you can keep the full resolution, is to only resize the images you make the processing on.

Step 4: Motion detection

The idea behind a simple motion detector is to have a picture of the background. Then for each frame you will subtract the background from the frame. This will identify all new object in the frame.

To get a good picture of the background it might be an idea to let the webcam film for a few frames, as it often needs to adjust.

The idea is mapped out here.

import cv2
import numpy as np

# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To capture the background - take a few iterations to stabilize view
while True:
    # Get the next frame
    _, bg_frame = cap.read()
    bg_frame = cv2.flip(bg_frame, 1)

    # Update the frame in the window
    cv2.imshow("Webcam", bg_frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) &amp; 0xFF == ord('q'):
        break

# Processing of frames are done in gray
bg_gray = cv2.cvtColor(bg_frame, cv2.COLOR_BGR2GRAY)
# We blur it to minimize reaction to small details
bg_gray = cv2.GaussianBlur(bg_gray, (5, 5), 0)


# This is where the game loop starts
while True:
    # Get the next frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    # Processing of frames are done in gray
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # We blur it to minimize reaction to small details
    gray = cv2.GaussianBlur(gray, (5, 5), 0)

    # Get the difference from last_frame
    delta_frame = cv2.absdiff(bg_gray, gray)
    # Have some threshold on what is enough movement
    thresh = cv2.threshold(delta_frame, 100, 255, cv2.THRESH_BINARY)[1]
    # This dilates with two iterations
    thresh = cv2.dilate(thresh, None, iterations=2)
    cv2.imshow("track", thresh)

   # Update the frame in the window
    cv2.imshow("Webcam", frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

First let the background be as you want to (empty without you in it). Then press q, to capture the background image used to subtract in the second loop.

The output of the second loop could look similar to this (Maybe with you instead of me).

Output

For a more detailed explanation of a motion tracker, you can read this tutorial on how to make a motion detector.

Step 5: Adding falling objects

This will be done by inserting an object in our frame and simply moving it downwards frame-by-frame.

To have all functionality related to the object I made an object class Object.

The full code is here.

import cv2
import numpy as np

# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# To capture the background - take a few iterations to stabilize view
while True:
    # Get the next frame
    _, bg_frame = cap.read()
    bg_frame = cv2.flip(bg_frame, 1)

    # Update the frame in the window
    cv2.imshow("Webcam", bg_frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) and 0xFF == ord('q'):
        break

# Processing of frames are done in gray
bg_gray = cv2.cvtColor(bg_frame, cv2.COLOR_BGR2GRAY)
# We blur it to minimize reaction to small details
bg_gray = cv2.GaussianBlur(bg_gray, (5, 5), 0)


# Read the logo to use later
class Object:
    def __init__(self, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.speed = 15
        self.x = 100
        self.y = 0
        self.score = 0

    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo

    def update_position(self, tresh):
        height, width = tresh.shape
        self.y += self.speed
        if self.y + self.size > height:
            self.y = 0
            self.x = np.random.randint(0, width - self.size - 1)
            self.score += 1

        # Check for collision
        roi = tresh[self.y:self.y + self.size, self.x:self.x + self.size]
        check = np.any(roi[np.where(self.logo_mask)])
        if check:
            self.score -= 1
            self.y = 0
            self.x = np.random.randint(0, width - self.size - 1)
            # self.speed += 1
        return check


# Let's create the object that will fall from the sky
obj = Object()

# This is where the game loop starts
while True:
    # Get the next frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    # Processing of frames are done in gray
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # We blur it to minimize reaction to small details
    gray = cv2.GaussianBlur(gray, (5, 5), 0)

    # Get the difference from last_frame
    delta_frame = cv2.absdiff(bg_gray, gray)
    # Have some threshold on what is enough movement
    thresh = cv2.threshold(delta_frame, 100, 255, cv2.THRESH_BINARY)[1]
    # This dilates with two iterations
    thresh = cv2.dilate(thresh, None, iterations=2)
    # cv2.imshow("track", thresh)

    hit = obj.update_position(thresh)
    obj.insert_object(frame)

    # To make the screen white when you get hit
    if hit:
        frame[:, :, :] = 255

    text = f"Score: {obj.score}"
    cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_PLAIN, 2, (0, 255, 0), 2)
    # Update the frame in the window
    cv2.imshow("Webcam", frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

But does it work?

Trying the game

No I am not too old for that stuff. Or, maybe I was just having one of those days.

It works, but of course it could be improved in many ways.

OpenCV + Python + Webcam: Create a Ghost Effect

What will we cover in this tutorial?

A ghost effect is when multiple images are combined into one image. In this tutorial we will see how this effect can be made effectively and easy. We will make it with a trackbar, so the effect can be adjusted.

Step 1: Understand how to create the ghost effect

We will start simple with only two images. Consider the following two images, which has the same background.

They can be combined using the OpenCV library, as the following code shows.

import cv2


img1 = cv2.imread("image1.png")
img2 = cv2.imread("image2.png")

img3 = cv2.addWeighted(src1=img1, alpha=0.5, src2=img2, beta=0.5, gamma=0.0)

cv2.imwrite("image3.png", img3)

The cv2.imread(…) reads the images into the variables img1 and img2, where the above two images are named image1.png and image2.png, respectively.

Then the cv2.addWeighted(…) is where the magic happens. The src1 and src2 parameters take each an image, while the alpha and beta determines how much weight each image should have. Here we have chose 50% (0.5) each. It is a good rule to let it add up to 100% (1.0).

Hence, the resulting image will be a 50% weighted and composed of the two input images. The result can be seen here.

The function cv2.addWeighted(…) can be used to create a ghost effect in a live stream from a webcam.

Step 2: Understanding the webcam stream

To understand how a processing flow from webcam works it is easiest to illustrate it by some simple code. If you are new to OpenCV and need it installed, please read this tutorial.

import cv2


# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Pre-preprocessing should be done here 

while True:
    # Capture the frame from the webcam
    _, frame = cap.read()

    # Processing should be done here

    # Show the frame to a window
    cv2.imshow("Webcam", frame)

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

The above code shows how a simple flow of capturing a frame from the webcam and showing it in a windows works. It is important to notice, that each image (or frame) from the webcam is handled individually.

This is handy, if we want to process it.

Step 3: Adding a ghost effect in the processing pipeline

We know from Step 1 how to make simple ghost effect with two images. If we use that simple way, we can actually do it frame-by-frame by saving the old frame. This also makes the effect to last more than one frame back.

import cv2


# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Pre-preprocessing should be done here
_, last_frame = cap.read()
while True:
    # Capture the frame from the webcam
    _, frame = cap.read()

    # Processing
    if frame.shape == last_frame.shape:
        frame = cv2.addWeighted(src1=frame, alpha=0.5, src2=last_frame, beta=0.5, gamma=0.0)

    # Show the frame to a window
    cv2.imshow("Webcam", frame)

    # Update last_frame
    last_frame = frame

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

That creates a simple ghost effect. While it is not very strong, you can change the values of alpha and beta.

But we can actually add a trackbar in order to change the value.

Step 4: Adding a trackbar to the window

This is just an add on to the above that will enable you to change the shadow effect while you stream your webcam. To add a trackbar we need a few things. First we need a variable that will be accessed anywhere in the code to keep the state of the ghost effect (ghost_effect). Also, we need a named window (cv2.namedWindow(…)) to access the same window to setup the trackbar in the window we stream in from the webcam.

Then we have the callback function on_ghost_trackbar(val) to update the value, both int he named window and the global variable ghost_effect. Then the call to cv2.createTrackbar(…) will set the callback function to on_ghost_trackbar. This will ensure that on any update (every time you move the trackbar) the function on_ghost_trackbar is called with the new value, where you update the ghost_effect variable, which is used to update the ghost effect in the cv2.addWeighted(…) call.

import cv2

# A global variable with the ghost effect
ghost_effect = 0.0
# Setup a window that can be referenced
window = "Webcam"
cv2.namedWindow(window)


# Used by the trackbar to change the ghost effect
def on_ghost_trackbar(val):
    global ghost_effect
    global window

    ghost_effect = val / 100.0
    cv2.setTrackbarPos("Shadow", window, val)


# Capture the webcam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Create a trackbar
cv2.createTrackbar("Ghost effect", window, 0, 100, on_ghost_trackbar)


# Get the first frame
_, last_frame = cap.read()
while True:
    # Get the next frame
    _, frame = cap.read()

    # Add the ghost effect
    if frame.shape == last_frame.shape:
        frame = cv2.addWeighted(src1=frame, alpha=1 - ghost_effect, src2=last_frame, beta=ghost_effect, gamma=0.0)

    # Update the frame in the window
    cv2.imshow(window, frame)
    
    # Update last_frame
    last_frame = frame

    # Check if q is pressed, terminate if so
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy windows
cap.release()
cv2.destroyAllWindows()

While the code becomes a bit more complex to add the trackbar to the window in the GUI, it has the same functionality.

Step 5: Test the ghost effect

Now we just need to test it to see if we got the desired effect.

Have fun and play with that.

OpenCV + Python + Webcam: How to Track and Replace Object

What will we cover in this tutorial?

In this tutorial we will look into how you can track an object with a specific color and replace it with a new object. The inserted new object will be scaled to the size of the object tracked. This will be done on a live stream from the webcam.

Understand the process from webcam and feeding it to a window

First thing to understand is that when processing a live stream from a webcam you are actually processing it frame by frame.

Hence, the base code is as follows.

import cv2

# Get the webcam
cap = cv2.VideoCapture(0)

while True:
    # Step 1: Capture the frame
    _, frame = cap.read()

    # Step 2: Show the frame with blurred background
    cv2.imshow("Webcam", frame)
    # If q is pressed terminate
    if cv2.waitKey(1) == ord('q'):
        break

# Release and destroy all windows
cap.release()
cv2.destroyAllWindows()

First we import the OpenCV library cv2. If you need help to install it read this tutorial. Then you capture the webcam by calling the cv2.VideoCapture(0), where we assume you have 1 webcam and it is the first one (0).

The the while-loop where you capture the video stream frame by frame. It is done calling the cap.read(), which returns a return code and the frame (we ignore the return code _).

To show the frame we read from the webcam, we call the cv2.imshow(“Webcam”, frame), which will create a window with the frame (image from your webcam).

The final part of the while-loop is checking if the key q has been pressed, if so, break out of the while-loop and release webcam and destroy all windows.

That is how processing works for webcam flow. The processing will be between step 1 and step 2 in the above code. Pro-processing and setup is most often done before the while-loop.

The process flow to identify and track object to insert scaled logo

In the last section we looked at how a webcam stream is processed. Then in this section we will explain the process for how to identify a object by color, scale the object we want to insert, and how to insert it into the frame.

The process is depicted in the image below followed by an explanation of all the steps.

The process of finding area to insert logo, masking it out, inserting and showing the frame.

The steps are described here.

  1. This is the step where we capture the raw frame from the webcam.
  2. To easier identify a specific color object in the frame, we convert the image to the HSV color model. It contains of Hue, Saturation, and Volume.
  3. Make a mask with all object of the specific color. This is where the HSV color model makes it easy.
  4. To make it more visible and easier for detection, we dilate the mask.
  5. Then we find all the contours in the mask.
  6. We loop over all the contours found. Ideally we only find one, but there might be small objects, which we will discard.
  7. Based on the contour found, get the size of it, which we use to scale (resize) the logo we want to insert.
  8. Resize the logo to fit the size of the contour.
  9. As the logo is not square, we need to create a mask to insert it.
  10. To insert it easily, we create a RIO (region of image) where the contour is. This is nothing needed, just makes it easier to avoid a lot of extra calculations. If you know NumPy, it is a view into it.
  11. Then we insert the logo using the mask.
  12. Finally, time to show the frame.

The implementation

The code following the steps described in the previous section is found here.

import cv2
import time
import imutils
import numpy as np

# Get the webcam
cap = cv2.VideoCapture(0)
# Setup the width and the height (your cam might not support these settings)
width = 640
height = 480
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

# Read the logo to use later
logo_org = cv2.imread('logo.png')

# Time is just used to get the Frames Per Second (FPS)
last_time = time.time()
while True:
    # Step 1: Capture the frame
    _, frame = cap.read()

    # Step 2: Convert to the HSV color space
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    # Step 3: Create a mask based on medium to high Saturation and Value
    # - Hue 8-10 is about orange, which we will use
    # - These values can be changed (the lower ones) to fit your environment
    mask = cv2.inRange(hsv, (8, 180, 180), (10, 255, 255))
    # Step 4: This dilates with two iterations (makes it more visible)
    thresh = cv2.dilate(mask, None, iterations=2)
    # Step 5: Finds contours and converts it to a list
    contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = imutils.grab_contours(contours)

    # Step 6: Loops over all objects found
    for contour in contours:
        # Skip if contour is small (can be adjusted)
        if cv2.contourArea(contour) < 750:
            continue

        # Step 7: Get the box boundaries
        (x, y, w, h) = cv2.boundingRect(contour)
        # Compute size
        size = (h + w)//2

        # Check if logo will be inside frame
        if y + size < height and x + size < width:
            # Step 8: Resize logo
            logo = cv2.resize(logo_org, (size, size))
            # Step 9: Create a mask of logo
            img2gray = cv2.cvtColor(logo, cv2.COLOR_BGR2GRAY)
            _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)

            # Step 10: Region of Image (ROI), where we want to insert logo
            roi = frame[y:y+size, x:x+size]

            # Step 11: Mask out logo region and insert
            roi[np.where(logo_mask)] = 0
            roi += logo

    # (Extra) Add a FPS label to image
    text = f"FPS: {int(1 / (time.time() - last_time))}"
    last_time = time.time()
    cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_PLAIN, 2, (0, 255, 0), 2)

    # Step 12: Show the frame
    cv2.imshow("Webcam", frame)
    # If q is pressed terminate
    if cv2.waitKey(1) == ord('q'):
        break

# Release and destroy all windows
cap.release()
cv2.destroyAllWindows()

Time to test it.

Testing the code

When using your webcam, you might need to change the colors. I used the following setting for the blue marker in my video.

    mask = cv2.inRange(hsv, (110, 120, 120), (130, 255, 255))

The two 3-tuples are HSV color space representation. The item of the tuples is setting the Hue. Here is 110 and 130. That means the color range we want to mask out is from 110-130, which you can see is in the blue range (image below). The other two are Saturation from 120-255 and Value from 120-255. To fit your camera and light settings, you need to change that range.

Where you can see the HSV color specter here.

HSV color space for OpenCV

You might need to choose different values.

OpenCV + Python: A Simple Approach to Blur the Background from Webcam

Why is it difficult to identify background?

It is not a difficult task to identify the background for a human, so why is it difficult for a computer? Well, it needs to identify what is part of the foreground, and intended to be part of the picture, and what is part of the of the background and therefore irrelevant for the picture.

Challenges like these are difficult for computer, while they seem obvious for humans. One approach could be to use machine learning and identify all humans on the picture and assume they are part of the picture. But still, that might not be right either. People can be in the background of the picture and not relevant. Like this situation.

Are all the humans part of the foreground?

How to solve it simple?

Good question. How do we identify what is background and what is part of the foreground? It depends on the use case.

  • If we only focus on one picture, it can be done manually.
  • On the other hand, if we need to do it on a live stream, we need something automating the process.
  • How important is it if it is not accurate.
    • Is it a conference call where you just want to hide the mess in the background?
    • Or do really important that nothing get’s out except what you define as foreground.

There are more things to consider that the above. It just gives you an idea that it is not that simple to answer.

Here we will assume that we need to process it fast and it is just to hide your background.

We would like something that can go from this.

With background

To this in a live stream from a webcam.

Blurred background

We are not aiming for the perfect, but for something simple that can be used to blur out the background including details like the writing in the background.

The overall process for blurring out the background

We will use the following pipeline of blurring out the background of an image.

  1. Capture the frame from the webcam.
  2. Convert it to HSV color space (see this tutorial for details on why?)
  3. Make a mask to get pixels of medium to high saturation and value (it seems to capture the foreground, as the background has lower saturation and value in the HSV color space.
  4. Create a blurred image frame.
  5. Combine the blurred with original frame based on the mask.
  6. Show the new combined frame.

The key thing to notice is that our assumption is that foreground things will have a medium til high saturation and value in the HSV color space. This is obviously not correct for all cases, but as the example will show, it will do a decent job in many cases.

The code that implements it

The code is available here.

import cv2
import time
import numpy as np


# Get the webcam
cap = cv2.VideoCapture(0)

# Time is just used to get the Frames Per Second (FPS)
last_time = time.time()
while True:
    # Step 1: Capture the frame
    _, frame = cap.read()

    # Step 2: Convert to the HSV color space
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    # Step 3: Create a mask based on medium to high Saturation and Value
    # - These values can be changed (the lower ones) to fit your environment
    mask = cv2.inRange(hsv, (0, 75, 40), (180, 255, 255))
    # We need a to copy the mask 3 times to fit the frames
    mask_3d = np.repeat(mask[:, :, np.newaxis], 3, axis=2)
    # Step 4: Create a blurred frame using Gaussian blur
    blurred_frame = cv2.GaussianBlur(frame, (25, 25), 0)
    # Step 5: Combine the original with the blurred frame based on mask
    frame = np.where(mask_3d == (255, 255, 255), frame, blurred_frame)

    # Add a FPS label to image
    text = f"FPS: {int(1 / (time.time() - last_time))}"
    last_time = time.time()
    cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_PLAIN, 2, (0, 255, 0), 2)

    # Step 6: Show the frame with blurred background
    cv2.imshow("Webcam", frame)
    # If q is pressed terminate
    if cv2.waitKey(1) == ord('q'):
        break

# Release and destroy all windows
cap.release()
cv2.destroyAllWindows()

A result can be seen in the video here.