OpenCV: Understand and Implement a Motion Tracker

What will we cover in this tutorial?

We will build and explain how a simple motion tracker works using OpenCV.

The resulting program will be able to track objects you define from the stream on a webcam.

Step 1: Understand the color histograms

An image in OpenCV is represented in a NumPy array. If you are new to NumPy arrays, they are basically fixed dimensional arrays with a fixed type. You can get a short introduction in this tutorial.

For simplicity let’s look at an example here.

import cv2

img = cv2.imread("pics/smile-00000.jpeg")
print(img.shape)
print(img)

This will print the shape of the NumPy (the dimensions of the NumPy array) as well as the NumPy array as well. Some of the output is given below.

(6000, 4000, 3)
[[[ 99 102  93]
  [ 87 104  91]
  [ 84 103  82]
  ...

Where we see the picture is of size 6000×4000 pixels, where each pixel is represented by 3 integers. Here the each pixel is represented in a BGR representation of 3 integers in the range of 0-255 representing the intensity of Blue, Green, and Read, respectively.

A histogram is of an image is counting how many occurrences there are of each pixel representation. A histogram of the above would normally be represented by three graphs, one for each of the colors Blue, Green, and Read.

It turns out, that such a representation can act quite similar as a fingerprint of the object in the picture. If we see a similar fingerprint, it could be the same object.

To optimize the process, one can represent colors in other ways. In this tutorial we will use the HSV (Hue , Saturation, Value). This has the advantage, that the color (or hue) information is stored in the first coordinate. This means that we can get the a decent fingerprint of an image by only one graph instead of 3 (like with the BGR) representation.

The above is an example of a histogram.

Step 2: How does the motion tracker work

On a high level we see the process in the following image.

First notice that there is a pre-processing part and a continuous processing. The pre-processing is done once to capture a histogram of the object we want to track, while the continuous processing is done for each frame coming from the webcam and uses the histogram to find the object on the new frame.

  1. The first thing is to capture the frame from the webcam.
  2. Here the frame is converted to HSV to have a simple histogram. Also, the object should be identified and put into a box (frame).
  3. A histogram based on the framed object is calculated and normalized.
  4. The continuous processing starts the same way as the pre-processing. Capture a frame from the webcam.
  5. Convert that frame into HSV.
  6. Take the HSV converted frame and use the histogram from the pre-processing to back project (basically the reverse process of making a histogram) and find a the best match in the near neighborhood of the former location of the object by using mean shift.
  7. Finally, update the box around the current detected position of the object.

Continue steps 4 to 7 until you get bored.

Ready to implement it.

Step 3: Implementation of the motion tracker

Using the numbering above with comments in the code.

import numpy as np
import cv2


def main(width=640, height=480):
    cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

    # Step 1: Capture the first frame of the webcam
    _, frame = cap.read()
    # For ease, let's flip it
    frame = cv2.flip(frame, 1)

    # Step 2: First we frame the object
    x, y, w, h = 300, 200, 100, 50
    track_window = (x, y, w, h)
    # set up the ROI for tracking
    roi = frame[y:y+h, x:x+w]

    # Step 2 (continued): Change the color space
    # This step seems not to be necessary?
    # HSV :  Hue , Saturation, Value) - smart way to convert colors
    hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)

    # Step 2 (continued): Make a mask based on hue 0-180 : why is that good?
    # For HSV, Hue range is [0,179], Saturation range is [0,255] and Value range is [0,255]
    mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180., 255., 255.)))

    # Step 3: Calculate the histogram
    roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180])
    # List of arguments:
    # images: Source arrays. They all should have the same depth, CV_8U, CV_16U or CV_32F , and the same size.
    # - Each of them can have an arbitrary number of channels.
    # channels: List of the dims channels used to compute the histogram.
    # - The first array channels are numerated from 0 to images[0].channels()-1 ,
    # - the second array channels are counted from images[0].channels() to images[0].channels() + images[1].channels()-1, and so on.
    # MASK: Optional mask. If the matrix is not empty, it must be an 8-bit array of the same size as images[i] .
    # - The non-zero mask elements mark the array elements counted in the histogram.
    # histSize	Array of histogram sizes in each dimension.
    # ranges: Array of the dims arrays of the histogram bin boundaries in each dimension.

    # Normalize the histogram to the range 0 - 255 (needed for calcBackProject)
    cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)
    # Setup the termination criteria, either 10 iteration or move by at least 1 pt
    # - Needed for meanShift to know when to terminate
    termination_criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)
    while True:
        # Step 4: Capture the next frame
        _, frame = cap.read()
        frame = cv2.flip(frame, 1)

        # Step 5: Change the color space to HSV
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

        # Step 6: Basically the reverse process of Histogram
        dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
        # Apply meanShift to get the new location
        _, track_window = cv2.meanShift(dst, track_window, termination_criteria)

        # Step 7: Draw it on image
        x, y, w, h = track_window
        frame = cv2.rectangle(frame, (x, y), (x+w, y+h), 255, 2)

        # Update the frame
        cv2.imshow('Tracking Frame', frame)
        k = cv2.waitKey(30)
        if k == 27:
            break

    # Release the webcam and destroy the window
    cap.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    main()

Let’s see how it works (in poor lightening in my living room).

Leave a Reply