How to Get Started with Yolo in Python

What will we cover in this tutorial?

How do you start with YOLO in Python? What to download? This tutorial will also cover a simple guide to how to use it in Python. The code has is as simple as possible with explanation.

Step 1: Download the Yolo stuff

The easy was to get things working is to just download the repository from GitHub as a zip file. You find the darknet repository here.

You can also reach and download it as a zip directly form here. The zip-file should be unpacked in the folder, where you develop you code. I renamed the resulting folder to yolo.

The next thing you need is the trained model, which you find on https://pjreddie.com/darknet/yolo/. Look for the following on the page and click on the weights.

We will use the YOLOv3-tiny, which you also can get directly from here.

The downloaded file should be placed in the folder where you develop your code.

Step 2: Load the network and apply it on an image

The code below is structured as follows. First you configure the location of the downloaded repository. Remember, I put it in the folder where I run my program and renamed it to yolo.

It then loads the labels of the possible objects, which a located in a file called coco.names. This is simply because the labels the network will give are indices into the names of coco.names. Further, it assigns some random colors to the labels, such that different labels have different colors.

After that it will read the network. Then it divides it into layers. It is a it unintuitive, but in the case of yolov3-tiny.cfg, it needs only two layers which it gets there.

It loads the image (from the repository), transforms it into a blob that the network understands and runs it on it.

import numpy as np
import time
import cv2
import os


DARKNET_PATH = 'yolo'

# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()

# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()

Then we need to parse the result in layer_outputs.

Step 3: Parse the result form layer_outputs (Yolo output)

This is at first a bit tricky. You need first to understand the overall flow.

First, you will run through all the results in the layers (we have two layers). Second, you will remove overlapped results, as there might be multiple boxes that identify the same object, just from a bit different boundary boxes. Third, and finally, you need to draw the remaining boxes with labels (and colors) on the image.

To go through that process we need three lists to keep track of it. One for the actual boxes that encapsulates the identified object (boxes). Then the corresponding confidence (confidence), that is, how sure is the algorithm. Finally, the class id, which is used to identify the name we have in the labels (class_ids).

The detection is the result, which in the 4 first entries has the position and size of the identified object. Then the following entries contains the confidence score on all the possible objects in the network.

# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []

# Loop over the layers
for output in layer_outputs:
    # For the layer loop over all detections
    for detection in output:
        # The detection first 4 entries contains the object position and size
        scores = detection[5:]
        # Then it has detection scores - it takes the one with maximal score
        class_id = np.argmax(scores).item()
        # The maximal score is the confidence
        confidence = scores[class_id].item()

        # Ensure we have some reasonable confidence, else ignorre
        if confidence > 0.3:
            # The first four entries have the location and size (center, size)
            # It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
            box = detection[0:4] * np.array([w, h, w, h])
            center_x, center_y, width, height = box.astype(int).tolist()

            # Calculate the upper corner
            x = center_x - width//2
            y = center_y - height//2

            # Add our findings to the lists
            boxes.append([x, y, width, height])
            confidences.append(confidence)
            class_ids.append(class_id)

# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)

# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
    # Loop over the indexes we are keeping
    for i in idxs.flatten():
        # Get the box information
        x, y, w, h = boxes[i]

        # Make a rectangle
        cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
        # Make and add text
        text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
        cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
                    0.5, colors[class_ids[i]], 2)

# Write the image with boxes and text
cv2.imwrite("example.png", image)
Resulting image

The full code together

The full source code put together.

import numpy as np
import time
import cv2
import os


DARKNET_PATH = 'yolo'

# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()

# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()


# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []

# Loop over the layers
for output in layer_outputs:
    # For the layer loop over all detections
    for detection in output:
        # The detection first 4 entries contains the object position and size
        scores = detection[5:]
        # Then it has detection scores - it takes the one with maximal score
        class_id = np.argmax(scores).item()
        # The maximal score is the confidence
        confidence = scores[class_id].item()

        # Ensure we have some reasonable confidence, else ignorre
        if confidence > 0.3:
            # The first four entries have the location and size (center, size)
            # It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
            box = detection[0:4] * np.array([w, h, w, h])
            center_x, center_y, width, height = box.astype(int).tolist()

            # Calculate the upper corner
            x = center_x - width//2
            y = center_y - height//2

            # Add our findings to the lists
            boxes.append([x, y, width, height])
            confidences.append(confidence)
            class_ids.append(class_id)

# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)

# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
    # Loop over the indexes we are keeping
    for i in idxs.flatten():
        # Get the box information
        x, y, w, h = boxes[i]

        # Make a rectangle
        cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
        # Make and add text
        text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
        cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
                    0.5, colors[class_ids[i]], 2)

# Write the image with boxes and text
cv2.imwrite("example.png", image)

Leave a Reply