What will we cover in this tutorial?
How do you start with YOLO in Python? What to download? This tutorial will also cover a simple guide to how to use it in Python. The code has is as simple as possible with explanation.
Step 1: Download the Yolo stuff
The easy was to get things working is to just download the repository from GitHub as a zip file. You find the darknet repository here.

You can also reach and download it as a zip directly form here. The zip-file should be unpacked in the folder, where you develop you code. I renamed the resulting folder to yolo.
The next thing you need is the trained model, which you find on https://pjreddie.com/darknet/yolo/. Look for the following on the page and click on the weights.

We will use the YOLOv3-tiny, which you also can get directly from here.
The downloaded file should be placed in the folder where you develop your code.
Step 2: Load the network and apply it on an image
The code below is structured as follows. First you configure the location of the downloaded repository. Remember, I put it in the folder where I run my program and renamed it to yolo.
It then loads the labels of the possible objects, which a located in a file called coco.names. This is simply because the labels the network will give are indices into the names of coco.names. Further, it assigns some random colors to the labels, such that different labels have different colors.
After that it will read the network. Then it divides it into layers. It is a it unintuitive, but in the case of yolov3-tiny.cfg, it needs only two layers which it gets there.
It loads the image (from the repository), transforms it into a blob that the network understands and runs it on it.
import numpy as np
import time
import cv2
import os
DARKNET_PATH = 'yolo'
# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()
# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()
Then we need to parse the result in layer_outputs.
Step 3: Parse the result form layer_outputs (Yolo output)
This is at first a bit tricky. You need first to understand the overall flow.
First, you will run through all the results in the layers (we have two layers). Second, you will remove overlapped results, as there might be multiple boxes that identify the same object, just from a bit different boundary boxes. Third, and finally, you need to draw the remaining boxes with labels (and colors) on the image.
To go through that process we need three lists to keep track of it. One for the actual boxes that encapsulates the identified object (boxes). Then the corresponding confidence (confidence), that is, how sure is the algorithm. Finally, the class id, which is used to identify the name we have in the labels (class_ids).
The detection is the result, which in the 4 first entries has the position and size of the identified object. Then the following entries contains the confidence score on all the possible objects in the network.
# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []
# Loop over the layers
for output in layer_outputs:
# For the layer loop over all detections
for detection in output:
# The detection first 4 entries contains the object position and size
scores = detection[5:]
# Then it has detection scores - it takes the one with maximal score
class_id = np.argmax(scores).item()
# The maximal score is the confidence
confidence = scores[class_id].item()
# Ensure we have some reasonable confidence, else ignorre
if confidence > 0.3:
# The first four entries have the location and size (center, size)
# It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
box = detection[0:4] * np.array([w, h, w, h])
center_x, center_y, width, height = box.astype(int).tolist()
# Calculate the upper corner
x = center_x - width//2
y = center_y - height//2
# Add our findings to the lists
boxes.append([x, y, width, height])
confidences.append(confidence)
class_ids.append(class_id)
# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)
# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
# Loop over the indexes we are keeping
for i in idxs.flatten():
# Get the box information
x, y, w, h = boxes[i]
# Make a rectangle
cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
# Make and add text
text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
0.5, colors[class_ids[i]], 2)
# Write the image with boxes and text
cv2.imwrite("example.png", image)

The full code together
The full source code put together.
import numpy as np
import time
import cv2
import os
DARKNET_PATH = 'yolo'
# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()
# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()
# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []
# Loop over the layers
for output in layer_outputs:
# For the layer loop over all detections
for detection in output:
# The detection first 4 entries contains the object position and size
scores = detection[5:]
# Then it has detection scores - it takes the one with maximal score
class_id = np.argmax(scores).item()
# The maximal score is the confidence
confidence = scores[class_id].item()
# Ensure we have some reasonable confidence, else ignorre
if confidence > 0.3:
# The first four entries have the location and size (center, size)
# It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
box = detection[0:4] * np.array([w, h, w, h])
center_x, center_y, width, height = box.astype(int).tolist()
# Calculate the upper corner
x = center_x - width//2
y = center_y - height//2
# Add our findings to the lists
boxes.append([x, y, width, height])
confidences.append(confidence)
class_ids.append(class_id)
# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)
# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
# Loop over the indexes we are keeping
for i in idxs.flatten():
# Get the box information
x, y, w, h = boxes[i]
# Make a rectangle
cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
# Make and add text
text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
0.5, colors[class_ids[i]], 2)
# Write the image with boxes and text
cv2.imwrite("example.png", image)
Python Circle
Do you know what the 5 key success factors every programmer must have?
How is it possible that some people become programmer so fast?
While others struggle for years and still fail.
Not only do they learn python 10 times faster they solve complex problems with ease.
What separates them from the rest?
I identified these 5 success factors that every programmer must have to succeed:
- Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
- Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
- Support: receive feedback on your work and ask questions without feeling intimidated or judged.
- Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
- Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.
I know how important these success factors are for growth and progress in mastering Python.
That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.
With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

Be part of something bigger and join the Python Circle community.
hello! how to solve this problem?
I’m so sorry to my short eng…
Traceback (most recent call last):
File “/Users/sungwon/Documents/YOLO/import numpy as np.py”, line 16, in
ln = [ln[i[0] – 1] for i in net.getUnconnectedOutLayers()]
File “/Users/sungwon/Documents/YOLO/import numpy as np.py”, line 16, in
ln = [ln[i[0] – 1] for i in net.getUnconnectedOutLayers()]
IndexError: invalid index to scalar variable.
(base) sungwon@seong-won-ui-MacBookPro YOLO %
thank you! good day~