From Zero to Creating Photo Mosaic using Faces with OpenCV

What will we cover in this tutorial?

  1. Where and how to get images you can use without copyright issues.
  2. How to extract the faces of the images.
  3. Building a Photo Mosaic using the extracted images of faces.

Step 1: Where and how to get images

There exists a lot of datasets of faces, but most have restrictions on them. A great place to find images is on Pexels, as they are free to use (see license here).

Also, the Python library pexels-api makes it easy to download a lot of images. It can be installed by the following command.

pip install pexels-api

To use the Pexels API you need to register.

  1. Sign up as a user at Pexels.
  2. Accept the email sent to your inbox (the email address you provide).
  3. Request your API key here.

Then you can download images by a search query from this Python program.

from pexels_api import API
import requests
import os.path
from pathlib import Path

path = 'pics'
Path(path).mkdir(parents=True, exist_ok=True)
# To get key: sign up for pexels https://www.pexels.com/join/
# Reguest key : https://www.pexels.com/api/
# - No need to set URL
# - Accept email send to you
# - Refresh API or see key here: https://www.pexels.com/api/new/
PEXELS_API_KEY = '--- INSERT YOUR API KEY HERE ---'
api = API(PEXELS_API_KEY)
query = 'person'
api.search(query)
# Get photo entries
photos = api.get_entries()
print("Search: ", query)
print("Total results: ", api.total_results)
MAX_PICS = 1000
print("Fetching max: ", MAX_PICS)
count = 0
while True:
    photos = api.get_entries()
    print(len(photos))
    if len(photos) == 0:
        break
    for photo in photos:
        # Print photographer
        print('Photographer: ', photo.photographer)
        # Print original size url
        print('Photo original size: ', photo.original)
        file = os.path.join(path, query + '-' + str(count).zfill(5) + '.' + photo.original.split('.')[-1])
        count += 1
        print(file)
        picture_request = requests.get(photo.original)
        if picture_request.status_code == 200:
            with open(file, 'wb') as f:
                f.write(picture_request.content)
        # This should be a function call to make a return
        if count >= MAX_PICS:
            break
    if count >= MAX_PICS:
        break
    if not api.has_next_page:
        print("Last page: ", api.page)
        break
        # Search next page
    api.search_next_page()

There is an upper limit of 1.000 photos in the above Python program, you can change that if you like. It is set to download photos that are shown if you query person. Feel free to change that.

It takes some time to download all the images and will take up some space.

Step 2: Extract the faces from the photos

Here OpenCV comes in. They have a trained model using the Haar Cascade Classifier. You need to install the OpenCV library by the following command.

pip install opencv-python

The trained model we use is part of the library, but is not loaded easily from the destination. Therefore we suggest you download it from here (it should be named: haarcascade_frontalface_default.xml) and add the it to the location you work from.

We want to use it to identify faces and extract them and save them in a library for later use.

import cv2
import numpy as np
import glob
import os
from pathlib import Path

def preprocess(box_width=12, box_height=16):
    path = "pics"
    output = "small-faces"
    Path(output).mkdir(parents=True, exist_ok=True)
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
    images = []
    cnt = 0
    for filename in files:
        print("Processing...", filename)
        frame = cv2.imread(filename)
        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frame_gray = cv2.equalizeHist(frame_gray)
        faces = face_cascade.detectMultiScale(frame_gray, scaleFactor=1.3, minNeighbors=10, minSize=(350, 350), flags=cv2.CASCADE_SCALE_IMAGE)
        for (x, y, w, h) in faces:
            roi = frame[y:y+h, x:x+w]
            img = cv2.resize(roi, (box_width, box_height))
            images.append(img)
            output_file_name = "face-" + str(cnt).zfill(5) + ".jpg"
            output_file_name = os.path.join(output, output_file_name)
            cv2.imwrite(output_file_name, img)
    return np.stack(images)

preprocess(box_width=12, box_height=16)

It will create a folder called small-faces with small images of the identified faces.

Notice, that the Haar Cascade Classifier is not perfect. It will miss a lot of faces and have false positives. It is a good idea to look manually though all the images and delete all false positives (images that are not having a face).

Step 3: Building our first mosaic photo

The approach to divide the photo into equal sized boxes. For each box to find the image (our faces), which fits the best as a replacement.

To improve performance of the process function we use Numba, which is a just-in-time compiler that is designed to optimize NumPy code in for-loops.

import cv2
import numpy as np
import glob
import os
from numba import jit

@jit(nopython=True)
def process(photo, images, box_width=24, box_height=32):
    height, width, _ = photo.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = photo[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            photo[i:i + box_height, j:j + box_width] = images[best_match_index]
    return photo

def main():
    photo = cv2.imread("rune.jpg")
    box_width = 12
    box_height = 16
    height, width, _ = photo.shape
    # To make sure that it we can slice the photo in box-sizes
    width = (width//box_width) * box_width
    height = (height//box_height) * box_height
    photo = cv2.resize(photo, (width, height))
    # Load all the images of the faces
    images = load_images(box_width, box_height)
    # Create the mosaic
    mosaic = process(photo.copy(), images, box_width, box_height)
    cv2.imshow("Original", photo)
    cv2.imshow("Result", mosaic)
    cv2.waitKey(0)

main()

To test it we have used the photo of Rune.

This reuses the same images. This gives a decent result, but if you want to avoid the extreme patterns of reused images, you can change the code for that.

The above example has 606 small images. If you avoid reuse it runs out fast of possible images. This would require a bigger base or the result becomes questionable.

No reuse of face images to create the Photo Mosaic

The above photo mosaic is created on a downscaled size, but still it does not create a good result, if you do not reuse images. This would require a quite larger set of images to work from.

Video Mosaic on Live Webcam Stream with OpenCV and Numba

What will we cover in this tutorial?

We will investigate if we can create a decent video mosaic effect on a live webcam stream using OpenCV, Numba and Python. First we will learn the simple way to create a video mosaic and investigate the performance of that. Then we will extend that to create a better quality video mosaic and try to improve the performance by lowering the quality.

Step 1: How does simple photo mosaic work?

A photographic mosaic is a photo generated by other small images. A black and white example is given here.

The above is not a perfect example of it as it is generated with speed to get it running smooth from a webcam stream. Also, it is done in gray scale to improve performance.

The idea is to generate the original image (photograph) by mosaic technique by a lot of smaller sampled images. This is done in the above with the original frame of 640×480 pixels and the mosaic is constructed of small images of size 16×12 pixels.

The first thing we want to achieve is to create a simple mosaic. A simple mosaic is when the original image is scaled down and each pixel is then exchanged with one small image with the same average color. This is simple and efficient to do.

On a high level this is the process.

  1. Have a collection C of small images used to create the photographic mosaic
  2. Scale down the photo P you want to create a mosaic of.
  3. For each pixel in photo P find the image I from C that has the closed average color as the pixel. Insert image I to represent that pixel.

This explains the simple way of doing. The next question is, will it be efficient enough to have a live webcam stream processed?

Step 2: Create a collection of small images

To optimize performance we have chosen to make it in gray scale. The first step is to collect images you want to use. This can be any pictures.

We have used photos from Pexels, which are all free for use without copyright.

What we need is to convert them all to gray scale and resize to fit our purpose.

import cv2
import glob
import os
import numpy as np
output = "small-pics-16x12"
path = "pics"
files = glob.glob(os.path.join(path, "*"))
for file_name in files:
    print(file_name)
    img = cv2.imread(file_name)
    img = cv2.resize(img, (16, 12))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    mean = np.mean(img)
    output_file_name = "image-" + str(mean).replace('.', '-') + ".jpg"
    output_file_name = os.path.join(output, output_file_name)
    print(output_file_name)
    cv2.imwrite(output_file_name, img)

The script assumes that we have located the images we want to convert to gray scale and resize are located in the local folder pics. Further, we assume that the output images (the processed images) will be put in an already existing folder small-pics-16×12.

Step 3: Get a live stream from the webcam

On a high level a live stream from a webcam is given in the following diagram.

This process framework is given in the code below.

import cv2
import numpy as np

def process(frame):
    return frame

def main():
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # Update the frame
        updated_frame = process(gray)
        # Show the frame in a window
        cv2.imshow('WebCam', updated_frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

main()

The above code is just an empty shell where the function call to process is where the all the processing will be. This code will just generate a window that shows a gray scale image.

Step 4: The simple video mosaic

We need to introduce two main things to create this simple video mosaic.

  1. Loading all the images we need to use (the 16×12 gray scale images).
  2. Fill out the processing of each frame, which replaces each 16×12 box of the frame with the best matching image.

The first step is preprocessing and should be done before we enter the main loop of the webcam capturing. The second part is done in each iteration inside the process function.

import cv2
import numpy as np
import glob
import os

def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)

def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame

def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # Update the frame
        mosaic_frame = process(gray, images)
        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

images = preprocess()
main(images)

The preprocessing function reads all the images, converts them to gray scale (to have only 1 channel per pixel), and returns them as a NumPy array to have optimized code.

The process function takes and breaks down the image in blocks of 16×12 pixels, computes the average gray scale, and takes the estimated best match. Notice the average (mean) value is a float, hence, we can have more than 256 gray scale images.

In this example we used 1.885 images to process it.

A result can be seen here.

The result is decent but not good.

Step 5: Testing the performance and improve it by using Numba

While the performance is quite good, let us test it.

We do that by using the time library.

First you need to import the time library.

import time

Then time the actual time the process call uses. New code inserted in the main while loop.

        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")

This will result in the following output.

Process time 0.02651691436767578 seconds
Process time 0.026834964752197266 seconds
Process time 0.025418996810913086 seconds
Process time 0.02562689781188965 seconds
Process time 0.025369882583618164 seconds
Process time 0.025450944900512695 seconds

Or a few lines from it. About 0.025-0.027 seconds.

Let’s try to use Numba in the equation. Numba is a just-in-time compiler for NumPy code. That means it compiles to python code to a binary for speed. If you are new to Numba we recommend you read this tutorial.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit

def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)

@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            mean = np.mean(roi[:, :])
            roi[:, :] = images[int((len(images)-1)*mean/256)]
    return frame

def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")
        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

images = preprocess()
main(images)

This gives the following performance.

Process time 0.0014820098876953125 seconds
Process time 0.0013887882232666016 seconds
Process time 0.0015859603881835938 seconds
Process time 0.0016350746154785156 seconds
Process time 0.0018379688262939453 seconds
Process time 0.0016241073608398438 seconds

Which is a factor 15-20 speed improvement.

Good enough for live streaming. But the result is still not decent.

Step 6: A more advanced video mosaic approach

The more advanced video mosaic consist of approximating the each replacement box of pixels by the replacement image pixel by pixel.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit

def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)

@jit(nopython=True)
def process(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            roi[:,:] = images[best_match_index]
    return frame

def main(images):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # Update the frame
        start = time.time()
        mosaic_frame = process(gray, images)
        print("Process time", time.time()- start, "seconds")
        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

images = preprocess()
main(images)

There is one line to notice specifically.

total_sum = np.sum(np.where(roi > images[k], roi - images[k], images[k] - roi))

Which is needed, as we work with unsigned 8 bit integers. What it does is, that it takes the and calculates the difference between each pixel in the region of interest (roi) and the image[k]. This is a very expensive calculation as we will see.

Performance shows the following.

Process time 7.030380010604858 seconds
Process time 7.034134149551392 seconds
Process time 7.105709075927734 seconds
Process time 7.138839960098267 seconds

Over 7 seconds for each frame. The result is what can be expected by using this amount of images, but the performance is too slow to have a flowing smooth live webcam stream.

The result can be seen here.

Step 7: Compromise options

There are various options to compromise for speed and we will not investigate all. Here are some.

  • Use fever images in our collection (use less than 1.885 images). Notice, that using half the images, say 900 images, will only speed up 50%.
  • Bigger image sizes. Scaling up to use 32×24 images. Here we will still need to do a lot of processing per pixel still. Hence, the expected speedup might be less than expected.
  • Make a compromised version of the difference calculation (total_sum). This has great potential, but might have undesired effects.
  • Scale down pixel estimation for fever calculations.

We will try the last two.

First, let’s try to exchange the calculation of total_sum, which is our distance function that measures how close our image is. Say, we use this.

                total_sum = np.sum(np.subtract(roi, images[k]))

This results in overflow if we have a calculation like 1 – 2 = 255, which is undesired. On the other hand. It might happen in expected 50% of the cases, and maybe it will skew the calculation evenly for all images.

Let’s try.

Process time 1.857623815536499 seconds
Process time 1.7193729877471924 seconds
Process time 1.7445549964904785 seconds
Process time 1.707035779953003 seconds
Process time 1.6778359413146973 seconds

Wow. That is a speedup of a factor 4-6 per frame. The quality is still fine, but you will notice a poorly mapped image from time to time. But the result is close to the advanced video mosaic and far from the first simple video mosaic.

Another addition we could make is to estimate each box by only 4 pixels. This should still be better than the simple video mosaic approach. I have given the full code below.

import cv2
import numpy as np
import glob
import os
import time
from numba import jit

def preprocess():
    path = "small-pics-16x12"
    files = glob.glob(os.path.join(path, "*"))
    files.sort()
    images = []
    for filename in files:
        img = cv2.imread(filename)
        images.append(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY))
    return np.stack(images)

def preprocess2(images, scale_width=8, scale_height=6):
    scaled = []
    _, height, width = images.shape
    print("Dimensions", width, height)
    width //= scale_width
    height //= scale_height
    print("Scaled Dimensions", width, height)
    for i in range(images.shape[0]):
        scaled.append(cv2.resize(images[i], (width, height)))
    return np.stack(scaled)

@jit(nopython=True)
def process3(frame, frame_scaled, images, scaled, box_height=12, box_width=16, scale_width=8, scale_height=6):
    height, width = frame.shape
    width //= scale_width
    height //= scale_height
    box_width //= scale_width
    box_height //= scale_height
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame_scaled[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, scaled.shape[0]):
                total_sum = np.sum(roi - scaled[k])
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            frame[i*scale_height:(i + box_height)*scale_height, j*scale_width:(j + box_width)*scale_width] = images[best_match_index]
    return frame

def main(images, scaled):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        # Update the frame
        start = time.time()
        gray_scaled = cv2.resize(gray, (640//8, 480//6))
        mosaic_frame = process3(gray, gray_scaled, images, scaled)
        print("Process time", time.time()- start, "seconds")
        # Show the frame in a window
        cv2.imshow('Mosaic Video', mosaic_frame)
        cv2.imshow('Webcam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

images = preprocess()
scaled = preprocess2(images)
main(images, scaled)

Where there is added preprocessing step (preprocess2). The process time is now.

Process time 0.5559628009796143 seconds
Process time 0.5979928970336914 seconds
Process time 0.5543379783630371 seconds
Process time 0.5621011257171631 seconds

Which is okay, but still less than 2 frames per seconds.

The result can be seen here.

It is not all bad. It is still better than the simple video mosaic approach.

The result is not perfect. If you want to use it on a live webcam stream with 25-30 frames per seconds, you need to find further optimizations of live with the simple mosaic video approach.

Using Numba for Efficient Frame Modifications in OpenCV

What will we cover in this tutorial?

We will compare the speed for using Numba optimization when making calculations and modifications on frames from a video stream using OpenCV.

In this tutorial we will divide each frame into same size boxes and calculate the average color for each box. Then make a frame which colors each box to that color.

See the effect down in the video. These calculations are expensive in Python, hence we will compare the performance by using Numba.

Step 1: Understand the process requirements

Each video frame from OpenCV is an image represented by a NumPy array. In this example we will use the webcam to capture a video stream and do the calculations and modifications live on the stream. This sets high requirements to the processing time of each frame.

To keep a fluid motion picture we need to show each frame in 1/25 of a second. That leaves at most 0.04 seconds for each frame, from capture, process, and update the window with the video stream.

While the capture and updating the window takes time, it leaves is a great uncertainty how fast the frame processing (calculations and modifications) should be, but a upper bound is 0.04 seconds per frame.

Step 2: The calculations and modifications on each frame

Let’s have some fun. The calculations and modification we want to apply to each frame are as follows.

  • Calculations. We divide each frame into small 6×16 pixels areas and calculate the average color for each area. To get the average color we calculate the average of each channel (BGR).
  • Modification. For each area we will change the color for each area and fill it entirely with the average color.

This can be done by adding this function to process each frame.

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

The frame will be divided into areas of the box size (box_height x box_width). For each box (roi: Region of Interest) the average (mean) value of each of the 3 color channels (b_mean, g_mean, r_mean) and overwriting the area to the average color.

Step 3: Testing performance for this frame process

To get an estimate of the time spend in function process, the cProfile library is quite good. It gives a profiling of time spent in each function call. This is great, since we can get an measure of how much time is spent in the function process.

We can accomplish that by running this code.

import cv2
import numpy as np
import cProfile

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        frame = process(frame)
        # Show the frame in a window
        cv2.imshow('WebCam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
cProfile.run("main()")

Where the interesting output line is given here.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    7.716    0.026   50.184    0.167 TEST2.py:8(process)

Which says we use 0.026 seconds per call in the process function. This is good, if we the overhead from the other functions in the main loop is less accumulated to 0.014 seconds.

If we investigate further the calls.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    5.132    0.017    5.132    0.017 {method 'read' of 'cv2.VideoCapture' objects}
      300    0.073    0.000    0.073    0.000 {resize}
      300    2.848    0.009    2.848    0.009 {waitKey}
      300    0.120    0.000    0.120    0.000 {flip}
      300    0.724    0.002    0.724    0.002 {imshow}

Which gives an overhead of approximately 0.028 seconds (0.017 + 0.009 + 0.002) from read, resize, flip, imshow and waitKey calls in each iteration. This adds up to a total of 0.054 seconds per frame or a frame rate of 18.5 frames per seconds (FPS).

This is too slow to make it running smooth.

Please notice that cProfile does add some overhead to measure the time.

Step 4: Introducing the Numba to optimize performance

The Numba library is designed to just-in-time compiling code to make NumPy loops faster. Wow. That is just what we need here. Let’s just jump right into it and see how it will do.

import cv2
import numpy as np
from numba import jit
import cProfile

@jit(nopython=True)
def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))
        frame = process(frame)
        # Show the frame in a window
        cv2.imshow('WebCam', frame)
        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()
main(iterations=1)
cProfile.run("main(iterations=300)")

Notice that we call the main loop with one iteration. This is done to call the process function once before we measure the performance as it will compile the code in the first call and keep it compiled.

The result is as follows.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    1.187    0.004    1.187    0.004 TEST2.py:7(pixels)

Which estimates a 0.004 seconds per call. This results in a total time of 0.032 seconds per iteration (0.028 + 0.004). This is sufficient to keep the performance of more than 24 frames-per-second (FPS).

Also, this improves the performance by a factor 6.5 times (7.717 / 1.187).

Conclusion

We got the desired speedup to have a live stream from the webcam and process it frame by frame by using Numba. The speedup was approximately 6.5 times.