ASCII Art of Live Webcam Stream with OpenCV

What will we cover in this tutorial?

Create ASCII Art on a live webcam stream using OpenCV with Python. To improve performance we will use Numba.

The result can look like the video below.

Step 1: A webcam flow with OpenCV in Python

If you need to install OpenCV for the first time we suggest you read this tutorial.

A normal webcam flow in Python looks like the following code.

import cv2

# Setup webcam camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    cv2.imshow("Webcam", frame)

    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This will make a live webcam stream from your webcam to a window. That is too easy not to enjoy.

Step 2: Prepare the letters to be used for ASCII art

There are many ways to achieve the ASCII art. For ease, we will create all the letters in a small gray scale (only with black and white) images. You could print the letters directly in the terminal, but it seems to be slower than just mapping the small images into a big image representing the ASCII art.

We use OpenCV to create all the letters.

import numpy as np

def generate_ascii_letters():
    images = []
    #letters = "# $%&amp;\\'()*+,-./0123456789:;<=>[email protected][]^_`abcdefghijklmnopqrstuvwxyz{|}~"
    letters = " \\ '(),-./:;[]_`{|}~"
    for letter in letters:
        img = np.zeros((12, 16), np.uint8)
        img = cv2.putText(img, letter, (0, 11), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
        images.append(img)
    return np.stack(images)

The list images appends all the images we create. At the end (in the return statement) we convert them to a Numpy array of images. This is done for speed as lists do not work with Numba, it needs the objects to be Numpy arrays.

If you like, you can use all the letters, by using the commented out letters string instead of the smaller with only special characters. We found the result looking better with the limited amount of letters.

A images is created simply by a black Numpy array of size 12×16 (that is width 16 and height 12). Then we add the text on the image by using cv2.putText(…).

Step 3: Transforming the webcam frame to only outline the objects

To get a decent result we found that converting the frames to only outline the object in the original frame. This can be achieved by using Canny edge detection (cv2.Canny(…)). To capture that from the live webcam stream it is advised to use Gaussian blur before.

import cv2

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    gb = cv2.GaussianBlur(frame, (5, 5), 0)
    can = cv2.Canny(gb, 127, 31)

    cv2.imshow('Canny edge detection', can)
    cv2.imshow("Webcam", frame)

    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This would result in something like this.

Step 4: Converting the Canny edge detection to ASCII art

This is where all the magic happens. We will take the Canny edge detected image and convert it to ASCII art.

First remember, we have a Numpy array of all the letters we want to use.

def to_ascii_art(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.absolute(np.subtract(roi, images[k])))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            roi[:,:] = images[best_match_index]
    return frame

The height and the width of the frame is take and then we iterate over it in small boxes of the size of the letters.

Each box is captured in a region of interest (roi). Then we loop over all possible letters and find the best match. This is not done with perfect calculation, as they are quite expensive. Hence we use the approximate calculation done in total_sum.

The correct calculation would be.

total_sum = np.sum(np.where(roi > images[k], np.subtract(roi, images[k]), np.subtract(images[k], roi)))

Alternatively, you could turn it into np.int16 instead of using np.uint8, which are causing all the problems here. Finally, notice that the cv2.norm(…) would also solve the problem, but as we need to optimize the code with Numba, this is not possible as it is not supported in Numba.

Step 5: Adding it all together and use Numba

Now we can add all the code together at try it out. We will also use Numba on the to_ascii_art function to speed it up. If you are new to Numba we can recommend this tutorial.

import cv2
import numpy as np
from numba import jit


@jit(nopython=True)
def to_ascii_art(frame, images, box_height=12, box_width=16):
    height, width = frame.shape
    for i in range(0, height, box_height):
        for j in range(0, width, box_width):
            roi = frame[i:i + box_height, j:j + box_width]
            best_match = np.inf
            best_match_index = 0
            for k in range(1, images.shape[0]):
                total_sum = np.sum(np.absolute(np.subtract(roi, images[k])))
                if total_sum < best_match:
                    best_match = total_sum
                    best_match_index = k
            roi[:,:] = images[best_match_index]
    return frame


def generate_ascii_letters():
    images = []
    #letters = "# $%&amp;\\'()*+,-./0123456789:;<=>[email protected][]^_`abcdefghijklmnopqrstuvwxyz{|}~"
    letters = " \\ '(),-./:;[]_`{|}~"
    for letter in letters:
        img = np.zeros((12, 16), np.uint8)
        img = cv2.putText(img, letter, (0, 11), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
        images.append(img)
    return np.stack(images)


# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

images = generate_ascii_letters()

while True:
    # Capture frame-by-frame
    _, frame = cap.read()
    frame = cv2.flip(frame, 1)

    gb = cv2.GaussianBlur(frame, (5, 5), 0)
    can = cv2.Canny(gb, 127, 31)

    ascii_art = to_ascii_art(can, images)

    cv2.imshow('ASCII ART', ascii_art)
    cv2.imshow("Webcam", frame)

    if cv2.waitKey(1) == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

This will give the following result (if you put me in front of the camera).

Also, try to use different character set. For example the full one also given in the code above.

Leave a Reply