OpenCV + Python: Move Objects Around in a Live Webcam Stream Using Your Hands

What will we cover in this tutorial?

How do you detect movements in a webcam stream? Also, how do you insert objects in a live webcam stream? Further, how do you change the position of the object based on the movements?

We will learn all that in this tutorial. The end result can be seen in the video below.

The end result of this tutorial

Step 1: Understand the flow of webcam processing

A webcam stream is processed frame-by-frame.

Illustration: Webcam processing flow

As the above illustration shows, when the webcam captures the next frame, the actual processing often happens on a copy of the original frame. When all the updates and calculations are done, they are inserted in the original frame.

This is interesting. To extract information from the webcam frame we need to work with the frame and find the features we are looking for.

In our example, we need to find movement and based on that see if that movement is touching our object.

A simple flow without any processing would look like this.

import cv2

# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()
# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

The above code will create a direct stream from your webcam to a window.

Step 2: Insert a logo – do it with a class that we will extend later

Here we want to insert a logo in a fixed position in our webcam stream. This can be achieved be the following code. The main difference is the new object Object defined and created.

The object briefly explained

  • The object will represent the logo we want to insert.
  • It will keep the current position (which is static so far)
  • The logo itself.
  • The mask used to insert it later (when insert_object is called).
  • The constructor (__init__(…)) does the stuff only needed once. Read the logo (it assumes you have a file named logo.png in the same folder), resize it, creating a mask (by gray scaling and thresholding), setting the initial positions of the logo.

Before the while-loop the object obj is created. All that is needed at this stage is to insert the logo in each frame.

import cv2
import numpy as np

# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y
    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo

# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Insert the object into the frame
    obj.insert_object(frame)
    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This will result in the following output (when you put me in front of the webcam – that said, if you do it, expect that you sit in the picture and not me (just want to avoid any uncomfortable surprises for you when you show up in the window)).

The logo at a fixed position.

For more details on how to insert a logo in a live webcam stream, you can read this tutorial.

Step 3: Detect movement in the frame

Detecting movement is not a simple task. Depending on your needs, it can be solved quite simple. In this tutorial we only need to detect simple movement. That is, if you are in the frame and sit still, we do not care to detect it. We only care to detect the actual movement.

We can solve that problem by using the library function createBackgroundSubtractorMOG2(), which can “remove” the background from your frame. It is far from a perfect solution, but it is sufficient for what we want to achieve.

As we only want to see if there is movement or not, and not how much the difference is from previous detected background, we will use a threshold function to make the image black and white based on that. We set the threshold quite high, as it will also remove noise from the image.

It might happen that in your settings (lightening etc.) you need to adjust that value. See the comments in the code how to do that.

import cv2
import numpy as np

# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y
    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo

# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()
# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)
    # Insert the object into the frame
    obj.insert_object(frame)
    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the foreground mask
    cv2.imshow('fg_mask', fg_mask)
    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output.

Output – again, don’t expect to see me when you run this example on your computer

As you see, it does a decent job to detect movement. Sometimes it happens that you create a shadow after your movements. Hence, it is not perfect.

Step 4: Detecting movement where the object is and move it accordingly

This is the tricky part. But let’s break it down simple.

  • We need to detect if the mask, we created in previous step, is overlapping with the object (logo).
  • If so, we want to move the object (logo).

That is what we want to achieve.

How do we do that?

  • Detect if there is an overlap by using the same mask we create for the logo and see if it overlaps with any points on the mask of the movement.
  • If so, we move the object by choosing a random movement. Measure how much overlap is. Then choose another random movement. See if the overlap is less.
  • Continue this a few times and chose the random movement with the least overlap.

This turns out to by chance to move away from the overlapping areas. This is the power of introducing some randomness, which simplifies the algorithm a lot.

A more precise approach would be to calculate in which direction the least mask is close to the object (logo). This becomes quite complicated and needs a lot of calculations. Hence, we chose to have this simple approach, which has both a speed element and direction element that works fairly well.

All we need to do, is to add a update_position function to our class and call it before we insert the logo.

import cv2
import numpy as np

# Object class to insert logo
class Object:
    def __init__(self, start_x=100, start_y=100, size=50):
        self.logo_org = cv2.imread('logo.png')
        self.size = size
        self.logo = cv2.resize(self.logo_org, (size, size))
        img2gray = cv2.cvtColor(self.logo, cv2.COLOR_BGR2GRAY)
        _, logo_mask = cv2.threshold(img2gray, 1, 255, cv2.THRESH_BINARY)
        self.logo_mask = logo_mask
        self.x = start_x
        self.y = start_y
        self.on_mask = False
    def insert_object(self, frame):
        roi = frame[self.y:self.y + self.size, self.x:self.x + self.size]
        roi[np.where(self.logo_mask)] = 0
        roi += self.logo
    def update_position(self, mask):
        height, width = mask.shape
        # Check if object is overlapping with moving parts
        roi = mask[self.y:self.y + self.size, self.x:self.x + self.size]
        check = np.any(roi[np.where(self.logo_mask)])
        # If object has moving parts, then find new position
        if check:
            # To save the best possible movement
            best_delta_x = 0
            best_delta_y = 0
            best_fit = np.inf
            # Try 8 different positions
            for _ in range(8):
                # Pick a random position
                delta_x = np.random.randint(-15, 15)
                delta_y = np.random.randint(-15, 15)
                # Ensure we are inside the frame, if outside, skip and continue
                if self.y + self.size + delta_y > height or self.y + delta_y < 0 or \
                        self.x + self.size + delta_x > width or self.x + delta_x < 0:
                    continue
                # Calculate how much overlap
                roi = mask[self.y + delta_y:self.y + delta_y + self.size, self.x + delta_x:self.x + delta_x + self.size]
                check = np.count_nonzero(roi[np.where(self.logo_mask)])
                # If perfect fit (no overlap), just return
                if check == 0:
                    self.x += delta_x
                    self.y += delta_y
                    return
                # If a better fit found, save it
                elif check < best_fit:
                    best_fit = check
                    best_delta_x = delta_x
                    best_delta_y = delta_y
            # After for-loop, update to best fit (if any found)
            if best_fit < np.inf:
                self.x += best_delta_x
                self.y += best_delta_y
                return

# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# To detect movement (to get the background)
background_subtractor = cv2.createBackgroundSubtractorMOG2()
# This will create an object
obj = Object()
# Loop forever (or until break)
while True:
    # Read the a frame from webcam
    _, frame = cap.read()
    # Flip the frame
    frame = cv2.flip(frame, 1)
    # Get the foreground mask (it is gray scale)
    fg_mask = background_subtractor.apply(frame)
    # Convert the gray scale to black and white with a threshold
    # Change the 250 threshold fitting your webcam and needs
    # - Setting it lower will make it more sensitive (also to noise)
    _, fg_mask = cv2.threshold(fg_mask, 250, 255, cv2.THRESH_BINARY)
    # Find a new position for object (logo)
    # - fg_mask contains all moving parts
    # - updated position will be the one with least moving parts
    obj.update_position(fg_mask)
    # Insert the object into the frame
    obj.insert_object(frame)
    # Show the frame in a window
    cv2.imshow('WebCam', frame)
    # To see the fg_mask uncomment the line below
    # cv2.imshow('fg_mask', fg_mask)
    # Check if q has been pressed to quit
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Test it

Well, this is the fun part. See a live demo in the video below.

The final result

What is next step?

I would be happy to hear any suggestions from you. I see a lot of potential improvements, but the conceptual idea is explained and showed in this tutorial.

Create Cartoon Characters in Live Webcam Stream with OpenCV and Python

What will we cover in this tutorial?

How to convert the foreground characters of a live webcam feed to become cartoons, while keeping the background as it is.

In this tutorial we will show how this can be done using OpenCV and Python in a few lines of code. The result can be seen in the YouTube video below.

Step 1: Find the moving parts

The big challenge is to identify what is the background and what is the foreground.

This can be done in various ways, but we want to keep it quite accurate and not just identifying boxes around moving objects. We actually want to have the contour of the objects and fill them all out.

While this sounds easy, it is a bit challenging. Still, we will try to do it as simple as possible.

First step is to keep the last frame and subtract it from the current frame. This will give all the moving parts. This should be done on a gray scale image.

import cv2
import numpy as np
# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground
    cv2.imshow('WebCam (Mask)', abs_diff)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This results in the following output with a gray scale contour of the moving part of the image. If you need help installing OpenCV read this tutorial.

Step 2: Using a threshold

To make the contour more visible you can use a threshold (cv2.threshold(…)).

import cv2
import numpy as np
# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground
    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
 
    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in this output

Using the threshold makes the image black and white. This helps it to become easier to detect the moving parts.

Step 3: Fill out the enclosed contours

To fill out the enclosed contours you can use morphologyEx. Also, we have used dilate to make the lines more thick and enclose the part better.

import cv2
import numpy as np
# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground
    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)
    cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Resulting in the following output.

Me happy, next to a white shadows ghost of myself

Step 4: Creating cartoon effect and mask it into the foreground

The final step is to create a cartoon version of the frame (cv2.stylization()).

    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)

And mask it out out with the foreground mask. This will result in the following code.

import cv2
import numpy as np
# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# Just a dummy frame, will be overwritten
last_foreground = np.zeros((480, 640), dtype='uint8')
while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    # Only needed if you webcam does not support 640x480
    frame = cv2.resize(frame, (640, 480))
    # Flip it to mirror you
    frame = cv2.flip(frame, 1)
    # Convert to gray scale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # Keep the foreground
    foreground = gray
    # Take the absolute difference
    abs_diff = cv2.absdiff(foreground, last_foreground)
    # Update the last foreground image
    last_foreground = foreground
    _, mask = cv2.threshold(abs_diff, 20, 255, cv2.THRESH_BINARY)
    mask = cv2.dilate(mask, None, iterations=3)
    se = np.ones((85, 85), dtype='uint8')
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, se)
    frame_effect = cv2.stylization(frame, sigma_s=150, sigma_r=0.25)
    idx = (mask > 1)
    frame[idx] = frame_effect[idx]
    # cv2.imshow('WebCam (Mask)', mask)
    cv2.imshow('WebCam (frame)', frame)
    if cv2.waitKey(1) == ord('q'):
        break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Step 5: Try it in real life

I must say the cartoon effect is heavy (is slow). But other than that, it works fine.

Create Cartoon Background in Webcam Stream using OpenCV

What will we cover in this tutorial?

How to create this effect.

Create this effect in a few lines of code

The idea behind the code

The idea behind the above effect is simple. We will use a background subtractor, which will get the background of an image and make a mask of the foreground.

The it simple follows this structure.

  1. Capture a frame from the webcam.
  2. Get the foreground mask fg_mask.
  3. To get greater effect dilate the fg_mask.
  4. From the original frame, create a cartoon frame.
  5. Use the zero entries of fg_mask as index to copy the cartoon frame into frame. That is, it overwrites all pixel corresponding to a zero (black) value in fg_mask to the values of the cartoon in the original frame. That results in that we only get cartoon effect in the background and not on the objects.
  6. Show the frame with background cartoon effect.

The code you need to create the above effect

This is all done by using OpenCV. If you need help to install OpenCV I suggest you read this tutorial. Otherwise the code follows the above steps.

import cv2
backSub = cv2.createBackgroundSubtractorKNN(history=200)
cap = cv2.VideoCapture(0)
while True:
    _, frame = cap.read()
    fg_mask = backSub.apply(frame)
    fg_mask = cv2.dilate(fg_mask, None, iterations=2)
    _, cartoon = cv2.pencilSketch(frame, sigma_s=50, sigma_r=0.3, shade_factor=0.02)
    idx = (fg_mask < 1)
    frame[idx] = cartoon[idx]
    cv2.imshow('Frame', frame)
    cv2.imshow('FG Mask', fg_mask)
    keyboard = cv2.waitKey(1)
    if keyboard == ord('q'):
        break