Using Numba for Efficient Frame Modifications in OpenCV

What will we cover in this tutorial?

We will compare the speed for using Numba optimization when making calculations and modifications on frames from a video stream using OpenCV.

In this tutorial we will divide each frame into same size boxes and calculate the average color for each box. Then make a frame which colors each box to that color.

See the effect down in the video. These calculations are expensive in Python, hence we will compare the performance by using Numba.

Step 1: Understand the process requirements

Each video frame from OpenCV is an image represented by a NumPy array. In this example we will use the webcam to capture a video stream and do the calculations and modifications live on the stream. This sets high requirements to the processing time of each frame.

To keep a fluid motion picture we need to show each frame in 1/25 of a second. That leaves at most 0.04 seconds for each frame, from capture, process, and update the window with the video stream.

While the capture and updating the window takes time, it leaves is a great uncertainty how fast the frame processing (calculations and modifications) should be, but a upper bound is 0.04 seconds per frame.

Step 2: The calculations and modifications on each frame

Let’s have some fun. The calculations and modification we want to apply to each frame are as follows.

  • Calculations. We divide each frame into small 6×16 pixels areas and calculate the average color for each area. To get the average color we calculate the average of each channel (BGR).
  • Modification. For each area we will change the color for each area and fill it entirely with the average color.

This can be done by adding this function to process each frame.

def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame

The frame will be divided into areas of the box size (box_height x box_width). For each box (roi: Region of Interest) the average (mean) value of each of the 3 color channels (b_mean, g_mean, r_mean) and overwriting the area to the average color.

Step 3: Testing performance for this frame process

To get an estimate of the time spend in function process, the cProfile library is quite good. It gives a profiling of time spent in each function call. This is great, since we can get an measure of how much time is spent in the function process.

We can accomplish that by running this code.

import cv2
import numpy as np
import cProfile


def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame


def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))

        frame = process(frame)

        # Show the frame in a window
        cv2.imshow('WebCam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

cProfile.run("main()")

Where the interesting output line is given here.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    7.716    0.026   50.184    0.167 TEST2.py:8(process)

Which says we use 0.026 seconds per call in the process function. This is good, if we the overhead from the other functions in the main loop is less accumulated to 0.014 seconds.

If we investigate further the calls.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    5.132    0.017    5.132    0.017 {method 'read' of 'cv2.VideoCapture' objects}
      300    0.073    0.000    0.073    0.000 {resize}
      300    2.848    0.009    2.848    0.009 {waitKey}
      300    0.120    0.000    0.120    0.000 {flip}
      300    0.724    0.002    0.724    0.002 {imshow}

Which gives an overhead of approximately 0.028 seconds (0.017 + 0.009 + 0.002) from read, resize, flip, imshow and waitKey calls in each iteration. This adds up to a total of 0.054 seconds per frame or a frame rate of 18.5 frames per seconds (FPS).

This is too slow to make it running smooth.

Please notice that cProfile does add some overhead to measure the time.

Step 4: Introducing the Numba to optimize performance

The Numba library is designed to just-in-time compiling code to make NumPy loops faster. Wow. That is just what we need here. Let’s just jump right into it and see how it will do.

import cv2
import numpy as np
from numba import jit
import cProfile


@jit(nopython=True)
def process(frame, box_height=6, box_width=16):
    height, width, _ = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            b_mean = np.mean(roi[:, :, 0])
            g_mean = np.mean(roi[:, :, 1])
            r_mean = np.mean(roi[:, :, 2])
            roi[:, :, 0] = b_mean
            roi[:, :, 1] = g_mean
            roi[:, :, 2] = r_mean
    return frame


def main(iterations=300):
    # Get the webcam (default webcam is 0)
    cap = cv2.VideoCapture(0)
    # If your webcam does not support 640 x 480, this will find another resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

    for _ in range(iterations):
        # Read the a frame from webcam
        _, frame = cap.read()
        # Flip the frame
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (640, 480))

        frame = process(frame)

        # Show the frame in a window
        cv2.imshow('WebCam', frame)

        # Check if q has been pressed to quit
        if cv2.waitKey(1) == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

main(iterations=1)
cProfile.run("main(iterations=300)")

Notice that we call the main loop with one iteration. This is done to call the process function once before we measure the performance as it will compile the code in the first call and keep it compiled.

The result is as follows.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      300    1.187    0.004    1.187    0.004 TEST2.py:7(pixels)

Which estimates a 0.004 seconds per call. This results in a total time of 0.032 seconds per iteration (0.028 + 0.004). This is sufficient to keep the performance of more than 24 frames-per-second (FPS).

Also, this improves the performance by a factor 6.5 times (7.717 / 1.187).

Conclusion

We got the desired speedup to have a live stream from the webcam and process it frame by frame by using Numba. The speedup was approximately 6.5 times.

Leave a Reply