## What will we cover?

Think about the following challenge. Given a circular linked. Create a queue with constant time operations for enqueue and dequeue. Also, you can only have one pointer to the Circular Linked List.

Consider the above diagram. How will you insert a new element? Okay, that seems easy. But how do you remove the oldest element from it? With constant time!

Yes, that is not that trivial.

## Step 1: Create the node class used to represent the circular linked list

There are of course many ways to go about how to create a linked list, and more specifically a circular linked list.

Some implement the classes together, that is both the node and the linked list together. I like to split them from each other. That way, you keep things simple.

Hence, to represent the circular linked list, I suggest the following simple class for a node. Then have the logic of the circular linked list in a different class.

```class Node:
def __init__(self, element=None, next_node=None):
self.element = element
self.next_node = next_node
```

Notice, that the above Node could also be used in a linked list alone. It is a simple building block, which can be used as you please.

## Step 2: Create the Queue using a circular linked list

You could create a class for a circular linked list, but we will use it closely from the queue.

The art is to create a queue with only one pointer to the circular linked list structure. The circular linked list can be represented with the Node class above.

The first operation we need is to enqueue an element in the queue. That is to insert an element in the queue.

The special case of inserting the first element, would result in a circular linked list with only one Node, which points at itself.

It is first when we insert the second element it becomes tricky. Which Node should the Queue pointer point at?

This is interesting. We need to be able to insert the element in constant time. Hence, if the queue has more element, it should not need to go all the way around the circular linked list. That makes it necessary, for us to be able to find the last element added with constant number of operations.

Also, the element in the circular linked list point in the direction the were inserted.

This results to that we need to point at the last inserted element.

Assuming in the above that the elements are inserted in the order 1, 2, 3, and 4.

Then we can insert the the elements with the following code.

```class Queue:
def __init__(self):
self.pointer = None

def enqueue(self, element):
if self.pointer is None:
node = Node(element)
self.pointer = node
self.pointer.next_node = node
else:
node = Node(element, self.pointer.next_node)
self.pointer.next_node = node
self.pointer = node
```

The code of enqueue shows first the base case, where the circular linked list is empty. Then the case where minimum one Node already exists.

Let’s follow the code of above. Before we call enqueue let’s assume it looks as follows.

First thing is to create a new Node (with the new element, below we use 5) and point it at the queues pointer next_node.

Point the queue pointers next_node at the new Node we created.

Then we update the queue pointer to point at the new Node.

That is it. We still have a circular linked list and only one pointer from the Queue. Notice, that this all happens in constant number of operations. That is, it is independent of how many Nodes are in the circular linked list. If we have 6,000 Nodes, we still only use the 3 operations described above. Hence, enqueue has a O(1) worst case run time.

Next step will see if we can dequeue correctly from the structure, while keeping it intact.

## Step 3: How to dequeue from the circular linked list

This seems tricky at first.

But look at the following diagram, where we have 4 elements. They have been inserted by the enqueue function given in Step 2. The first element was 1, then 2, then 3, and finally element 4.

Hence, when we dequeue, we want to return and remove element 1. And the circular linked list should look like this afterwards.

Also, the dequeue operation should be done in constant number of operations.

Obviously, there is a special case, when we only have one element in the queue. Then we remove it and make the queue pointer point at None.

```class Queue:
# Continued...

def dequeue(self):
if self.pointer is None:
return None
if self.pointer.next_node == self.pointer:
element = self.pointer.element
self.pointer = None
return element

element = self.pointer.next_node.element
self.pointer.next_node = self.pointer.next_node.next_node
return element

```

Otherwise, we get the element from the queue pointers next_node.

Then we update the queue pointers node to point at next_node.next_node. Notice, that this will keep the structure intact. Also, we keep the dequeue function at a constant number of operations, hence, O(1).

## The full code with a print queue function

Here is the full code including a representation to print the queue.

```class Node:
def __init__(self, element=None, next_node=None):
self.element = element
self.next_node = next_node

class Queue:
def __init__(self):
self.pointer = None

def enqueue(self, element):
if self.pointer is None:
node = Node(element)
self.pointer = node
self.pointer.next_node = node
else:
node = Node(element, self.pointer.next_node)
self.pointer.next_node = node
self.pointer = node

def dequeue(self):
if self.pointer is None:
return None
if self.pointer.next_node == self.pointer:
element = self.pointer.element
self.pointer = None
return element

element = self.pointer.next_node.element
self.pointer.next_node = self.pointer.next_node.next_node
return element

def __str__(self):
if self.pointer is None:
return "empty"
first = self.pointer
node = first.next_node
result = "["
while node != first:
result += str(node.element) + '->'
node = node.next_node
return result + str(node.element) + '->...]'

queue = Queue()
for i in range(20):
queue.enqueue(i)
print(queue)

for _ in range(21):
queue.dequeue()
print(queue)

for i in range(5):
queue.enqueue(i)
print(queue)
```

The beginning of the output will be as follows.

```[0->...]
[0->1->...]
[0->1->2->...]
[0->1->2->3->...]
[0->1->2->3->4->...]
[0->1->2->3->4->5->...]
```

## What will we cover in this tutorial?

We will create a heap, or more specifically, a max-heap. A max-heap is a tree structure where the node value of every parent is greater or equal to the children.

In this tutorial we will implement a max-heap with a binary tree and use a randomized approach to keep it balanced.

You might be wondering why to make it randomized. Simply, said, to keep it simple and keep operations on average O(log(n)).

## Step 1: Recall what a max-heap is

A max-heap is a tree structure where the node value of every parent is greater or equal to the children.

A heap will have two primary functions.

1. Insert an element and still keep the max-heap structure.
2. Get and remove maximum element (the element at the root) and still keep the max-heap structure.

The goal is to be able to do these operations in O(log(n)) time.

## Step 2: Understand what a randomized algorithm can do for you

Randomized algorithms help you achieve great performance on average while keeping the algorithms simple.

To get keep the operations of a heap at worst-case O(log(n)), you need to keep the binary tree structure balanced. This requires complex ways to ensure that.

Instead, just put in the leaves in the three randomly, and you will get the same result with very high probability. Hence, you will end up with an average time of O(log(n)) for the operations.

## Step 3: Insert into a max-heap

Now to the fun part. The code.

Let’s start simple and create a Node to represent the nodes in the binary tree, which will represent the max-heap.

```class Node:
def __init__(self, element):
self.element = element
self.left = None
self.right = None
```

The node needs to be able to keep the element (which should be comparable), and a left and right child.

From the above Node class you can create an arbitrary binary tree.

The max-heap insert function can be implemented by a recursive and randomized approach in the following manner.

```import random

class Heap:
def __init__(self):

def _insert(self, element, node):
# if element is larger then node.element, switch
if element > node.element:
element, node.element = node.element, element

# check if available node
if node.left is None:
node.left = Node(element)
return
if node.right is None:
node.right = Node(element)
return

# Choose a random node (here is the randomness hidden)
if random.randint(0, 1) == 0:
self._insert(element, node.left)
else:
self._insert(element, node.right)

def insert(self, element):
else:
```

The function insert(…) checks for the special case, if there are no nodes in the binary tree so far and inserts if so. Otherwise, it will forward the call to the recursive and randomized function _insert(….), which also takes the head (root) of the tree as argument.

A recursive function in this case could be at any node, but starting from the head (root) node. It will do the same all the way down.

1. Check if element of node is smaller than element to insert. If so, switch them.
2. Check if node has a free child (left or right). If so, use it to insert new node with element and return.
3. If none of the above, choose a random child (left or right) and call recursive down.

That is it. It will most likely create a well balanced binary tree.

See example here.

```               +---------------36---------------+
+-------29-------+              +-------34-------+
+---27---+      +---20---+      +---32---+      +---33---+
+- 3-+  +-13-+  +- 6-+  +- 2-+  +-24-+  +-31-+  +-25-+  +-25-+
0   1           4              16               6
```

In simple ascii representation of the binary tree representing the max-heap. That the binary tree keeps a balance like that ensures that the insertion will be O(log(n)) on average.

## Step 4: Delete the maximum element from the heap (and return it)

Deleting the maximum element will remove the root (head) of the binary tree. Then we need to take the larger child and move it up. That obviously makes an empty space in the child below. Hence, we need to do the same operation below.

This sounds recursive, doesn’t it?

```import random

class Heap:
def __init__(self):

def _insert(self, element, node):
# if element is larger than node.element, switch
if element > node.element:
element, node.element = node.element, element

# check if available node
if node.left is None:
node.left = Node(element)
return
if node.right is None:
node.right = Node(element)
return

if random.randint(0, 1) == 0:
self._insert(element, node.left)
else:
self._insert(element, node.right)

def insert(self, element):
else:

def get_max(self):

def _delete_max(self, node):
if node.left is None and node.right is None:
return None

if node.left is None:
return node.right

if node.right is None:
return node.left

if node.right.element > node.left.element:
node.element = node.right.element
node.right = self._delete_max(node.right)
return node
else:
node.element = node.left.element
node.left = self._delete_max(node.left)
return node

def delete_max(self):
return None

return max_element
```

The delete_max function takes care of the special case where there are no elements (or nodes) in the binary tree. Then it takes the largest element and calls the recursive _delete_max(…) function with the head (root) as argument.

The _delete_max(…) function does the following.

1. Checks for special case where node has no children. If so, return None.
2. Check if one child is not there, if so return the existing child.
3. Otherwise, take the child with the larger element. Take the larger element and assign it to node (remember, we have removed the element form the calling node), and call recursive down with larger child on _delete_max(…) and assign result to larger child node.

That can be a bit confusing at first. But try it out.

This operation also only has O(log(n)) performance on average. And as elements are put randomly, then removing them in order (maximum elements), will remove elements randomly and keep the binary tree balanced on average case.

## Step 5: The full code and a simple print function of the tree

The full code can be found here.

```import random

class Node:
def __init__(self, element):
self.element = element
self.left = None
self.right = None

class Heap:
def __init__(self):

def _insert(self, element, node):
# if element is larger than node.element, switch
if element > node.element:
element, node.element = node.element, element

# check if available node
if node.left is None:
node.left = Node(element)
return
if node.right is None:
node.right = Node(element)
return

if random.randint(0, 1) == 0:
self._insert(element, node.left)
else:
self._insert(element, node.right)

def insert(self, element):
else:

def get_max(self):

def _delete_max(self, node):
if node.left is None and node.right is None:
return None

if node.left is None:
return node.right

if node.right is None:
return node.left

if node.right.element > node.left.element:
node.element = node.right.element
node.right = self._delete_max(node.right)
return node
else:
node.element = node.left.element
node.left = self._delete_max(node.left)
return node

def delete_max(self):
return None

return max_element

def _get_depth(self, node):
if node is None:
return 0
left = self._get_depth(node.left)
right = self._get_depth(node.right)
if left > right:
return 1 + left
else:
return 1 + right

def get_depth(self):

def _print_heap(self, current_level, request_level, depth, node):
characters_per_level = 4*2**depth
characters_per_node = characters_per_level // (2**(current_level + 1))
if current_level == request_level:
if node is not None:
space_fill = characters_per_node // 4 - 1
if request_level == depth - 1:
print(' '*space_fill + ' ' + ' '*space_fill + f'{node.element:2d}' + ' '*space_fill + ' ' + ' '*space_fill, end='')
else:
print(' '*space_fill + '+' + '-'*space_fill + f'{node.element:2d}' + '-'*space_fill + '+' + ' '*space_fill, end='')
else:
print(' '*characters_per_node, end='')
else:
if node is not None:
self._print_heap(current_level + 1, request_level, depth, node.left)
self._print_heap(current_level + 1, request_level, depth, node.right)
else:
self._print_heap(current_level + 1, request_level, depth, None)
self._print_heap(current_level + 1, request_level, depth, None)

def print_heap(self):
for i in range(depth):
print()
```

Notice that the print function also is recursive.

## What will we cover in this tutorial?

Given an input of a maze, how can we solve it?

That is – given an input like this the following.

```#############
S #     #   #
# #  ##   # #
# #  ## ### #
#       #   #
######### # #
#         # E
#############
```

How can we find a path through the maze starting from S and ending at E.

```#############
S.#.....#...#
#.#. ##...#.#
#.#. ## ###.#
#...    #  .#
######### #.#
#         #.E
#############
```

We will create a program that can do that in Python.

## Step 1: Reading and representing the Maze

An good representation of the maze makes it easier to understand the code. Hence, the first part is to figure out how to represent the maze.

To clarify, imagine that this maze.

```v
```

Was represented by one long string, like this.

```maze = "#############S #     #   ## #  ##   # ## #  ## ### ##       #   ########## # ##         # E#############"
```

While this is possible, if you also know the dimensions (8 rows and 13 columns), it makes things difficult to navigate in. Say, you are somewhere in the maze, and you want to see what is right above you. How do you do that?

Yes, you can do it, but it is complex.

Now image you have the following representation.

```maze = [['#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#'],
['S', ' ', '#', ' ', ' ', ' ', ' ', ' ', '#', ' ', ' ', ' ', '#'],
['#', ' ', '#', ' ', ' ', '#', '#', ' ', ' ', ' ', '#', ' ', '#'],
['#', ' ', '#', ' ', ' ', '#', '#', ' ', '#', '#', '#', ' ', '#'],
['#', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '#', ' ', ' ', ' ', '#'],
['#', '#', '#', '#', '#', '#', '#', '#', '#', ' ', '#', ' ', '#'],
['#', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '#', ' ', 'E'],
['#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#']]
```

This will make it easier to navigate in the maze.

Then maze represents row 1 (2nd row) and column 0 (1st column), which is S.

If we assume that the maze is represented in a file, then we can read that file and convert it with the following code.

```def load_maze(file_name):
f = open(file_name)
f.close()
return maze

def convert_maze(maze):
converted_maze = []
lines = maze.splitlines()
for line in lines:
converted_maze.append(list(line))
return converted_maze

maze = convert_maze(maze)
print(maze)
```

## Step 2: Creating a print function of the maze

As you probably noticed, the print statement of the maze (print(maze)) did not do a very good job.

```[['#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#'], ['S', ' ', '#', ' ', ' ', ' ', ' ', ' ', '#', ' ', ' ', ' ', '#'], ['#', ' ', '#', ' ', ' ', '#', '#', ' ', ' ', ' ', '#', ' ', '#'], ['#', ' ', '#', ' ', ' ', '#', '#', ' ', '#', '#', '#', ' ', '#'], ['#', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '#', ' ', ' ', ' ', '#'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', ' ', '#', ' ', '#'], ['#', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '#', ' ', 'E'], ['#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#']]
```

In order to be able to follow what happens in the code later on, it would be nice with a better representation of the code.

```def print_maze(maze):
for row in maze:
for item in row:
print(item, end='')
print()

print_maze(maze)
```

This will print the maze in a more human readable way.

```#############
S #     #   #
# #  ##   # #
# #  ## ### #
#       #   #
######### # #
#         # E
#############
```

## Step 3: Finding the starting point of the maze

First thing first. We need to find where we enter the maze. This is where the S is located.

We could have some assumptions here, say, that the starting point is always located on the outside borders of the maze. To make it more general, we will assume that the starting point, S, can be located anywhere in the maze.

This makes the algorithm to find it easy, but of course not effective in worst case, as we need to check all locations in the maze.

```def find_start(maze):
for row in range(len(maze)):
for col in range(len(maze)):
if maze[row][col] == 'S':
return row, col

start = find_start(maze)
print(start)
```

Which will go through all rows and columns one by one and check if it is the starting point. When found, it will return it immediately.

Notice, that we do assume that we have a maze with at least one row, also that the starting point exists. If not, it will not return anything.

## Step 4: Find if there is a path in the maze

This is where a stack comes in handy. If you are new to what a stack is, then check out this tutorial.

For the purpose here, we are not looking for performance, we just need the concept of a stack.

To see if there is a way through the maze, you need to check all paths. As some leads to the same points, we need to keep track of where we have been. If we reach a dead end, where we have been, we need to backtrack to the last point with options, and check them out.

Wow. That was a lot of talking. How can we understand that a bit better.

Let’s make a simple maze and see it.

```###
S E
###
```

When you are at start, S, you have (in theory) 4 possible ways to go. Up, Down, Right, Left. In the above example you can only go Right. Let’s do that, and mark the position we just left.

```###
[email protected]
###
```

Now we have marked where we have been with X and where we are with @. Then we can check all possible ways. Up, Down, Right, Left. Again we can only go Right, as Left we have already been there.

Luckily, going Right is the exit E and we are done.

Now that was a simple maze. Now how does a stack come into the picture? Well, we used it secretly and in a very simple case.

So let’s try a bit more complex case.

```#####
S # E
#   #
# # #
#####
```

Now let’s also have a stack with the positions we can visit. We will use the coordinate system as the representation of the list of lists we use to represent the maze. That means that the upper left corner is (0, 0) and the lower right corner is (5, 5) in this case.

The starting point is (1 , 0).

Now, let’s have a stack in the picture and put the staring point on that stack.

```#####
S # E
#   #
# # #
#####      stack -> (1, 0)
```

When we enter we look for points on the stack. Yes, we have one point (1, 0). Then we pop it off and check all possible positions. Up, Down, Right, Left. Only one is possible, Left, which is (1, 1). Updating where we been, then it looks like this.

```#####
X # E
#   #
# # #
#####      stack -> (1, 1)
```

Notice, that the stack changed and we marked the first point with X.

We do the same. Pop of the stack and get (1, 1) and check Up, Down, Right, Left. Only down is possible, which leaves us in (2, 1). Now the picture looks like this.

```#####
XX# E
#   #
# # #
#####      stack -> (2, 1)
```

Then we do the same. We pop of the stack and get (2, 1) and check Up, Down, Right, Left. Now we can go both Down and Right, which is (3, 1) and (2, 2), respectively. Hence, we push both of them on the stack.

```#####
XX# E
#X  #
# # #               (3, 1)
#####      stack -> (2, 2)
```

Depending on the order we put thins on stack, it will look like that.

Now we pop of the top element (3, 1) and check Up, Down, Right, Left. But there are no way to go from there, it is a dead end. Then we mark that spot and do not push anything on the stack.

```#####
XX# E
#X  #
#X# #
#####      stack -> (2, 2)
```

Now we pop (2, 2) of the stack, as this is the first possibility we have to go another way in the maze. Then check Up, Down, Right, Left. We can only go Right. After marking pushing we have.

```#####
XX# E
#XX #
#X# #
#####      stack -> (3, 2)
```

Now we can both go up and down and add that to the stack.

```#####
XX# E
#XXX#
#X# #               (3, 2)
#####      stack -> (3, 4)
```

Now we pop (3, 2) of the stack. Then we check all and see we can only go Right.

```#####
XX#XE
#XXX#
#X# #               (4, 2)
#####      stack -> (3, 4)
```

Then we pop (4, 2) of the stack and see it is the exit, E. That is just what we needed.

The stack keeps track on all the possible choices we have along the way. If we reach a dead end (also, if we have visited all around us and marked it by X), we can look at the last possible choice we skipped, which is on the top of the stack.

That actually means, that we do not need to move backwards along the road we went, we can go directly to the point on the stack. We know we can reach that point, as we have already visited a neighboring point, which is connected. Also, as we mark everything along the way, we do not go in endless loops. We only visit every point at most once.

Now, if there was no way to the exit, E, this will eventually terminate when there is no points on the stack.

## Step 5: Implementing the algorithm

The above can be made into a simple algorithm to determine whether there is a way through the maze.

The pseudo code for the algorithm could be something like this.

```stack.push(start)
while stack is not empty
location = stack.pop()

if location is marked continue to next location in loop

if location is marked as exit - return True (we found the exit)

for loc as Up, Down, Left, Right:
if loc is valid and not marked add to stack

return False (if all locations are tried (stack empty) there is not way to the exit)
```

Now let’s turn that into code.

```# This is only a helper function to see if we have a valid positino.
def is_valid_position(maze, pos_r, pos_c):
if pos_r < 0 or pos_c < 0:
return False
if pos_r >= len(maze) or pos_c >= len(maze):
return False
if maze[pos_r][pos_c] in ' E':
return True
return False

def solve_maze(maze, start):
# We use a Python list as a stack - then we have push operations as append, and pop as pop.
stack = []

# Add the entry point (as a tuple)
stack.append(start)

# Go through the stack as long as there are elements
while len(stack) > 0:
pos_r, pos_c = stack.pop()

if maze[pos_r][pos_c] == 'E':
print("GOAL")
return True

if maze[pos_r][pos_c] == 'X':
continue

# Mark position as visited
maze[pos_r][pos_c] = 'X'
# Check for all possible positions and add if possible
if is_valid_position(maze, pos_r - 1, pos_c):
stack.append((pos_r - 1, pos_c))
if is_valid_position(maze, pos_r + 1, pos_c):
stack.append((pos_r + 1, pos_c))
if is_valid_position(maze, pos_r, pos_c - 1):
stack.append((pos_r, pos_c - 1))
if is_valid_position(maze, pos_r, pos_c + 1):
stack.append((pos_r, pos_c + 1))

# We didn't find a path, hence we do not need to return the path
return False
```

## The full code

Here we present the full code. It prints out the state through the processing. It expects a file called maze.txt with the maze in it. See below for a possible file content.

```def load_maze(file_name):
f = open(file_name)
f.close()
return maze

def convert_maze(maze):
converted_maze = []
lines = maze.splitlines()
for line in lines:
converted_maze.append(list(line))
return converted_maze

def print_maze(maze):
for row in maze:
for item in row:
print(item, end='')
print()

def find_start(maze):
for row in range(len(maze)):
for col in range(len(maze)):
if maze[row][col] == 'S':
return row, col

from time import sleep

# This is only a helper function to see if we have a valid positino.
def is_valid_position(maze, pos_r, pos_c):
if pos_r < 0 or pos_c < 0:
return False
if pos_r >= len(maze) or pos_c >= len(maze):
return False
if maze[pos_r][pos_c] in ' E':
return True
return False

def solve_maze(maze, start):
# We use a Python list as a stack - then we have push operations as append, and pop as pop.
stack = []

# Add the entry point (as a tuple)
stack.append(start)

# Go through the stack as long as there are elements
while len(stack) > 0:
pos_r, pos_c = stack.pop()

print("Current position", pos_r, pos_c)

if maze[pos_r][pos_c] == 'E':
print("GOAL")
return True

if maze[pos_r][pos_c] == 'X':
continue

# Mark position as visited
maze[pos_r][pos_c] = 'X'
# Check for all possible positions and add if possible
if is_valid_position(maze, pos_r - 1, pos_c):
stack.append((pos_r - 1, pos_c))
if is_valid_position(maze, pos_r + 1, pos_c):
stack.append((pos_r + 1, pos_c))
if is_valid_position(maze, pos_r, pos_c - 1):
stack.append((pos_r, pos_c - 1))
if is_valid_position(maze, pos_r, pos_c + 1):
stack.append((pos_r, pos_c + 1))

print('Stack:' , stack)
print_maze(maze)

# We didn't find a path, hence we do not need to return the path
return False

maze = convert_maze(maze)
print_maze(maze)
start = find_start(maze)
print(start)
print(solve_maze(maze, start))
```

Then for some possible content of file maze.txt needed by the program above. To make the above code work, save the content below in a file called maze.txt in the same folder from where you run the code.

```#############
S #     #   #
# #  ##   # #
# #  ## ### #
#       #   #
######### # #
#         # E
#############
```

## Further work

The above code only answers if there is a path from S to E in the maze. It would also be interesting to print a route out. Further improvements could be finding the shortest path.

## What will we cover in this tutorial?

We will continue the work of this tutorial (Create a Moving Photo Slideshow with Weighted Transitions in OpenCV). The challenge is that construction is that we pre-load all the photos we need. The reason for that, is that loading the photos in each iteration would affect the performance of the slideshows.

The solution we present in this tutorial is to load photos in a background thread. This is not straightforward as we need to ensure the communication between the main thread and the background photo loading thread is done correctly.

The result will be similar.

## Already done so far and the challenge

In the previous tutorial we made great progress, creating a nice slideshow. The challenge was the long pre-loading time of the photos.

If we did not pre-load the photos, then we would need to load the photos in each iteration. Say, in the beginning of the loop, we would need to load the next photo. This would require disk access, which is quite slow. As the frame is updated quite often, this loading time will not let the new position of the photo to be updated for a fraction of a second (or more, depending the photo size and the speed of the processor). This will make the movement of the photo lacking and not run smoothly.

Said differently, in one thread, the processor can only make one thing at the time. When you tell the processor to load a photo, it will stop all other work in this program, and do that first, before updating the frame. As this can be a big task, it will take long time. As the program needs to update the frame continuously, to make the photo move, then this will be visible no matter when you tell the processor to load the image.

So how can we deal with this?

Using another thread to load the image. Having multiple threads will make it possible to do more than one thing at the time. If you have two threads, you can do two things at the same time.

A Python program is by default run in one thread. Hence, it can only do one thing at the time. If you need to do more than one thing at the time, you need to use threading.

Now this sound simple, but it introduces new problems.

When working with threading a lock is a good tool to know. Basically, a lock is similar to a lock. You can enter and lock the door after you, such that no one else can enter. When you are done, you can unlock the door and leave. Then someone else can enter.

This is the same principle with a lock with threading. You can take a lock, and ensure you only enter the code after the lock, if no-one else (another thread) is using the lock. Then when you are done, you release the lock.

We need a stack of photos that can load new photos when needed.

```class ImageStack:
def __init__(self, filenames, size=3):
if size > len(filenames):
raise Exception("Not enough file names")
self.size = size
self.filenames = filenames
self.stack = []
while len(self.stack) < self.size:
filename = self.filenames[random.randrange(0, len(self.filenames))]
if any(item == filename for item in self.stack):
continue
self.stack.append((filename, Image(filename)))
# Lock used for accessing the stack

def get_image(self):
self.stack_lock.acquire()
filename, img = self.stack.pop()
print(f"Get image {filename} (stack size:{len(self.stack)})")
self.stack_lock.release()
return img

filename = self.filenames[random.randrange(0, len(self.filenames))]
self.stack_lock.acquire()
while any(item == filename for item in self.stack):
filename = self.filenames[random.randrange(0, len(self.filenames))]
self.stack_lock.release()
img = Image(filename)
self.stack_lock.acquire()
self.stack.append((filename, img))
print(f"Add image {filename} (stack size: {len(self.stack)})")
self.stack_lock.release()
```

The above is an image stack which has two locks. One for accessing the stack and one for adding images.

The lock for stack is to ensure that only one thread is accessing the stack. Hence if we have the code.

```stack = ImageStack(filenames)
stack.get_image()
```

The code above will create an ImageStack (notice that filenames is not defined here). Then on the second line it will start a new process to add a new image. After that it will try to get an image. But here the lock comes into the picture. If the thread with add_image has acquired the stack lock, then get_image call cannot start (it will be waiting in the first line to acquire stack lock).

There are more possible situations where the lock hits in. If the 3rd line with stack.get_image acquires the stack lock before that the call to add_image reaches the lock, then add_image needs to wait until the lock is released by the stack.get_image call.

Threading is a lot of fun but you need to understand how locks work and how to avoid deadlocks.

## Full code

Below you will find the full code using a threading approach to load photos in the background.

```import cv2
import glob
import os
import random

class Image:
def __init__(self, filename, time=500, size=500):
self.filename = filename
self.size = size
self.time = time
self.shifted = 0.0
height, width, _ = img.shape
if width < height:
self.height = int(height*size/width)
self.width = size
self.img = cv2.resize(img, (self.width, self.height))
self.shift = self.height - size
self.shift_height = True
else:
self.width = int(width*size/height)
self.height = size
self.shift = self.width - size
self.img = cv2.resize(img, (self.width, self.height))
self.shift_height = False
self.delta_shift = self.shift/self.time
self.reset()

def reset(self):
if random.randint(0, 1) == 0:
self.shifted = 0.0
self.delta_shift = abs(self.delta_shift)
else:
self.shifted = self.shift
self.delta_shift = -abs(self.delta_shift)

def get_frame(self):
if self.shift_height:
roi = self.img[int(self.shifted):int(self.shifted) + self.size, :, :]
else:
roi = self.img[:, int(self.shifted):int(self.shifted) + self.size, :]
self.shifted += self.delta_shift
if self.shifted > self.shift:
self.shifted = self.shift
if self.shifted < 0:
self.shifted = 0
return roi

class ImageStack:
def __init__(self, filenames, size=3):
if size > len(filenames):
raise Exception("Not enough file names")
self.size = size
self.filenames = filenames
self.stack = []
while len(self.stack) < self.size:
filename = self.filenames[random.randrange(0, len(self.filenames))]
if any(item == filename for item in self.stack):
continue
self.stack.append((filename, Image(filename)))
# Lock used for accessing the stack

def get_image(self):
self.stack_lock.acquire()
filename, img = self.stack.pop()
print(f"Get image {filename} (stack size:{len(self.stack)})")
self.stack_lock.release()
return img

filename = self.filenames[random.randrange(0, len(self.filenames))]
self.stack_lock.acquire()
while any(item == filename for item in self.stack):
filename = self.filenames[random.randrange(0, len(self.filenames))]
self.stack_lock.release()
img = Image(filename)
self.stack_lock.acquire()
self.stack.append((filename, img))
print(f"Add image {filename} (stack size: {len(self.stack)})")
self.stack_lock.release()

def process():
path = "pics"
filenames = glob.glob(os.path.join(path, "*"))

buffer = ImageStack(filenames)

prev_image = buffer.get_image()
current_image = buffer.get_image()

while True:
for i in range(100):
alpha = i/100
beta = 1.0 - alpha
dst = cv2.addWeighted(current_image.get_frame(), alpha, prev_image.get_frame(), beta, 0.0)

cv2.imshow("Slide", dst)
if cv2.waitKey(1) == ord('q'):
return

for _ in range(300):
cv2.imshow("Slide", current_image.get_frame())
if cv2.waitKey(1) == ord('q'):
return

prev_image = current_image
current_image = buffer.get_image()

process()
```

## What will we cover in this tutorial?

In this tutorial you will learn how to make a slideshow of your favorite photos moving across the screen with weighted transitions. This will be done in Python with OpenCV.

See the result in the video below.

## Step 1: A simple approach without moving effect

If you want to build something great, start with something simple first. The reason for that is that you will learn along the way. It is difficult to understand all aspects from the beginning.

Start small. Start simple. Learn from each step.

Here we assume that you have all your favorite photos in a folder called pics. In the first run, you just want to show them on your screen one-by-one.

```import cv2
import glob
import os

def process():
path = "pics"
filenames = glob.glob(os.path.join(path, "*"))

for filename in filenames:
print(filename)

cv2.imshow("Slideshow", img)

if cv2.waitKey(1000) == ord('q'):
return

process()
```

As you will realize, this will show the photos in the size they stored. Hence, when photos change, the dimensions of the window will change as well (unless the two consecutive photos have the exact same dimensions). This will not make a good user experience. Also, if the photos dimensions are larger than your screen resolution, they will not be fully visible.

## Step 2: Scaling images to fit inside the screen

We want the window size where we show the photos to have fixed size. This is not as simple as it sounds.

Image a photo has dimensions 1000 x 2000 pixels. Then the next one has 2000 x 1000. How would you scale it down? If you scale it down to 500 x 500 by default, then the objects in the images will be flatten or narrowed together.

Hence, what we do in our first attempt, is to scale it based on the dimensions. That is, a photo with dimensions 1000 x 2000 will become 500 x 1000. And a photo of dimensions 2000 x 1000 will become 1000 x 500. Then we will crop it to fit the 500 x 500 dimension. The cropping will take the middle of the photo.

```import cv2
import glob
import os

def process():
path = "pics"
filenames = glob.glob(os.path.join(path, "*"))

for filename in filenames:
print(filename)

height, width, _ = img.shape
if width < height:
height = int(height*500/width)
width = 500
img = cv2.resize(img, (width, height))
shift = height - 500
img = img[shift//2:-shift//2,:,:]
else:
width = int(width*500/height)
height = 500
shift = width - 500
img = cv2.resize(img, (width, height))
img = img[:,shift//2:-shift//2,:]

cv2.imshow("Slideshow", img)

if cv2.waitKey(1000) == ord('q'):
return

process()
```

This gives a better experience, but not perfect.

## Step 3: Make a weighted transition between image switches

To make a weighted transition we make it by adding a transition phase of the photos.

```import cv2
import glob
import os
import numpy as np

def process():
path = "pics"
filenames = glob.glob(os.path.join(path, "*"))

prev_image = np.zeros((500, 500, 3), np.uint8)
for filename in filenames:
print(filename)

height, width, _ = img.shape
if width < height:
height = int(height*500/width)
width = 500
img = cv2.resize(img, (width, height))
shift = height - 500
img = img[shift//2:-shift//2,:,:]
else:
width = int(width*500/height)
height = 500
shift = width - 500
img = cv2.resize(img, (width, height))
img = img[:,shift//2:-shift//2,:]

for i in range(101):
alpha = i/100
beta = 1.0 - alpha
dst = cv2.addWeighted(img, alpha, prev_image, beta, 0.0)

cv2.imshow("Slideshow", dst)
if cv2.waitKey(1) == ord('q'):
return

prev_image = img

if cv2.waitKey(1000) == ord('q'):
return

process()
```

Notice the prev_image variable that is needed. It is set to a black image when it enters the loop. The transition is made by using cv2.addWeighted(…) to get the effect.

## Step 4: Make the photos move while showing

The idea is to let the photo move. Say, if the photo is scaled to dimension 500 x 1000. Then we want to create a view of that photo of size 500 x 500 that slides from one end to the other while it is showing.

This requires that we have a state for the photo, which stores where we are in of the current view.

For this purpose we create a class to represent a photo that keeps the current view. It also includes the resizing.

```import cv2
import numpy as np
import glob
import os
import random

class Image:
def __init__(self, filename, time=500, size=500):
self.size = size
self.time = time
self.shifted = 0.0
self.height, self.width, _ = self.img.shape
if self.width < self.height:
self.height = int(self.height*size/self.width)
self.width = size
self.img = cv2.resize(self.img, (self.width, self.height))
self.shift = self.height - size
self.shift_height = True
else:
self.width = int(self.width*size/self.height)
self.height = size
self.shift = self.width - size
self.img = cv2.resize(self.img, (self.width, self.height))
self.shift_height = False
self.delta_shift = self.shift/self.time

def reset(self):
if random.randint(0, 1) == 0:
self.shifted = 0.0
self.delta_shift = abs(self.delta_shift)
else:
self.shifted = self.shift
self.delta_shift = -abs(self.delta_shift)

def get_frame(self):
if self.shift_height:
roi = self.img[int(self.shifted):int(self.shifted) + self.size, :, :]
else:
roi = self.img[:, int(self.shifted):int(self.shifted) + self.size, :]
self.shifted += self.delta_shift
if self.shifted > self.shift:
self.shifted = self.shift
if self.shifted < 0:
self.shifted = 0
return roi

def process():
path = "pics"
filenames = glob.glob(os.path.join(path, "*"))

cnt = 0
images = []
for filename in filenames:
print(filename)

img = Image(filename)

images.append(img)
if cnt > 300:
break
cnt += 1

prev_image = images[random.randrange(0, len(images))]
prev_image.reset()

while True:
while True:
img = images[random.randrange(0, len(images))]
if img != prev_image:
break
img.reset()

for i in range(100):
alpha = i/100
beta = 1.0 - alpha
dst = cv2.addWeighted(img.get_frame(), alpha, prev_image.get_frame(), beta, 0.0)

cv2.imshow("Slide", dst)
if cv2.waitKey(1) == ord('q'):
return

prev_image = img
for _ in range(300):
cv2.imshow("Slide", img.get_frame())
if cv2.waitKey(1) == ord('q'):
return

process()
```

This results in a nice way where the photos slowly move through the view. It also has added some randomness. First of all, it takes a random photo. Also, the direction is set to be random.

This is a good start of having a nice slideshow of your favorite photos.

## What will we cover in this tutorial?

How to make a simple live graph that updates into a live webcam stream by using OpenCV.

The result can be seen in the video below.

## Step 1: A basic webcam flow with OpenCV

If you need to install OpenCV for the first time we suggest you read this tutorial.

A normal webcam flow in Python looks like the following code.

```import cv2

# Setup webcam camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)

cv2.imshow("Webcam", frame)

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

This will make a live webcam stream from your webcam to a window. That is too easy not to enjoy.

## Step 2: Create an object to represent the graph

There are many ways to create a graph. Here we will make an object which will have a representation of the graph. Then it will have a function to update the value and update the graph image.

```class Graph:
def __init__(self, width, height):
self.height = height
self.width = width
self.graph = np.zeros((height, width, 3), np.uint8)

def update_frame(self, value):
if value < 0:
value = 0
elif value >= self.height:
value = self.height - 1
new_graph = np.zeros((self.height, self.width, 3), np.uint8)
new_graph[:,:-1,:] = self.graph[:,1:,:]
new_graph[self.height - value:,-1,:] = 255
self.graph = new_graph

def get_graph(self):
return self.graph
```

This object is a simple object that keeps the graph as a OpenCV image (Numpy array).

The update function first verifies that the value of inside the graph size.

Then it creates a a new graph (new_graph) and copies the old values from previous graph, but shifted one position. Then it will update the new value by white color.

## Step 3: Putting it all together

The Graph object created in last step needs a value. This value can be anything. Here we make a simple measure of how much movement is in the frame.

This is simply done by comparing the current frame with the previous frame. This could be done straight forward, but to minimize noise we use a gray scaled images, which we use Gaussian blur on. Then the absolute difference from last frame is used, and summing it up.

The value used to scale down is highly dependent on the settings your webcam is in. Also, if you use another resolution, then it will affect it. Hence, if the graph is all low (zero) or high (above height) then adjust this graph.update_frame(int(difference/42111)) to some other integer value in the division.

```import cv2
import numpy as np

class Graph:
def __init__(self, width, height):
self.height = height
self.width = width
self.graph = np.zeros((height, width, 3), np.uint8)

def update_frame(self, value):
if value < 0:
value = 0
elif value >= self.height:
value = self.height - 1
new_graph = np.zeros((self.height, self.width, 3), np.uint8)
new_graph[:,:-1,:] = self.graph[:,1:,:]
new_graph[self.height - value:,-1,:] = 255
self.graph = new_graph

def get_graph(self):
return self.graph

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

graph = Graph(100, 60)

prev_frame = np.zeros((480, 640), np.uint8)
while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)
frame = cv2.resize(frame, (640, 480))

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (25, 25), None)
diff = cv2.absdiff(prev_frame, gray)
difference = np.sum(diff)
prev_frame = gray

graph.update_frame(int(difference/42111))
roi = frame[-70:-10, -110:-10,:]
roi[:] = graph.get_graph()

cv2.putText(frame, "...wanted a live graph", (20, 430), cv2.FONT_HERSHEY_PLAIN, 1.8, (200, 200, 200), 2)
cv2.putText(frame, "...measures motion in frame", (20, 460), cv2.FONT_HERSHEY_PLAIN, 1.8, (200, 200, 200), 2)
cv2.imshow("Webcam", frame)

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

## What will we cover in this tutorial?

Create ASCII Art on a live webcam stream using OpenCV with Python. To improve performance we will use Numba.

The result can look like the video below.

## Step 1: A webcam flow with OpenCV in Python

If you need to install OpenCV for the first time we suggest you read this tutorial.

A normal webcam flow in Python looks like the following code.

```import cv2

# Setup webcam camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)

cv2.imshow("Webcam", frame)

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

This will make a live webcam stream from your webcam to a window. That is too easy not to enjoy.

## Step 2: Prepare the letters to be used for ASCII art

There are many ways to achieve the ASCII art. For ease, we will create all the letters in a small gray scale (only with black and white) images. You could print the letters directly in the terminal, but it seems to be slower than just mapping the small images into a big image representing the ASCII art.

We use OpenCV to create all the letters.

```import numpy as np

def generate_ascii_letters():
images = []
#letters = "# \$%&amp;\\'()*+,-./0123456789:;<=>[email protected][]^_`abcdefghijklmnopqrstuvwxyz{|}~"
letters = " \\ '(),-./:;[]_`{|}~"
for letter in letters:
img = np.zeros((12, 16), np.uint8)
img = cv2.putText(img, letter, (0, 11), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
images.append(img)
return np.stack(images)
```

The list images appends all the images we create. At the end (in the return statement) we convert them to a Numpy array of images. This is done for speed as lists do not work with Numba, it needs the objects to be Numpy arrays.

If you like, you can use all the letters, by using the commented out letters string instead of the smaller with only special characters. We found the result looking better with the limited amount of letters.

A images is created simply by a black Numpy array of size 12×16 (that is width 16 and height 12). Then we add the text on the image by using cv2.putText(…).

## Step 3: Transforming the webcam frame to only outline the objects

To get a decent result we found that converting the frames to only outline the object in the original frame. This can be achieved by using Canny edge detection (cv2.Canny(…)). To capture that from the live webcam stream it is advised to use Gaussian blur before.

```import cv2

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)

gb = cv2.GaussianBlur(frame, (5, 5), 0)
can = cv2.Canny(gb, 127, 31)

cv2.imshow('Canny edge detection', can)
cv2.imshow("Webcam", frame)

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

This would result in something like this.

## Step 4: Converting the Canny edge detection to ASCII art

This is where all the magic happens. We will take the Canny edge detected image and convert it to ASCII art.

First remember, we have a Numpy array of all the letters we want to use.

```def to_ascii_art(frame, images, box_height=12, box_width=16):
height, width = frame.shape
for i in range(0, height, box_height):
for j in range(0, width, box_width):
roi = frame[i:i + box_height, j:j + box_width]
best_match = np.inf
best_match_index = 0
for k in range(1, images.shape):
total_sum = np.sum(np.absolute(np.subtract(roi, images[k])))
if total_sum < best_match:
best_match = total_sum
best_match_index = k
roi[:,:] = images[best_match_index]
return frame
```

The height and the width of the frame is take and then we iterate over it in small boxes of the size of the letters.

Each box is captured in a region of interest (roi). Then we loop over all possible letters and find the best match. This is not done with perfect calculation, as they are quite expensive. Hence we use the approximate calculation done in total_sum.

The correct calculation would be.

```total_sum = np.sum(np.where(roi > images[k], np.subtract(roi, images[k]), np.subtract(images[k], roi)))
```

Alternatively, you could turn it into np.int16 instead of using np.uint8, which are causing all the problems here. Finally, notice that the cv2.norm(…) would also solve the problem, but as we need to optimize the code with Numba, this is not possible as it is not supported in Numba.

## Step 5: Adding it all together and use Numba

Now we can add all the code together at try it out. We will also use Numba on the to_ascii_art function to speed it up. If you are new to Numba we can recommend this tutorial.

```import cv2
import numpy as np
from numba import jit

@jit(nopython=True)
def to_ascii_art(frame, images, box_height=12, box_width=16):
height, width = frame.shape
for i in range(0, height, box_height):
for j in range(0, width, box_width):
roi = frame[i:i + box_height, j:j + box_width]
best_match = np.inf
best_match_index = 0
for k in range(1, images.shape):
total_sum = np.sum(np.absolute(np.subtract(roi, images[k])))
if total_sum < best_match:
best_match = total_sum
best_match_index = k
roi[:,:] = images[best_match_index]
return frame

def generate_ascii_letters():
images = []
#letters = "# \$%&amp;\\'()*+,-./0123456789:;<=>[email protected][]^_`abcdefghijklmnopqrstuvwxyz{|}~"
letters = " \\ '(),-./:;[]_`{|}~"
for letter in letters:
img = np.zeros((12, 16), np.uint8)
img = cv2.putText(img, letter, (0, 11), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255)
images.append(img)
return np.stack(images)

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

images = generate_ascii_letters()

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)

gb = cv2.GaussianBlur(frame, (5, 5), 0)
can = cv2.Canny(gb, 127, 31)

ascii_art = to_ascii_art(can, images)

cv2.imshow('ASCII ART', ascii_art)
cv2.imshow("Webcam", frame)

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

This will give the following result (if you put me in front of the camera).

Also, try to use different character set. For example the full one also given in the code above.

## What will we cover?

How to create a video like the one below using Pandas + GeoPandas + OpenCV in Python.

1. How to collect newest COVID-19 data in Python using Pandas.
2. Prepare data and calculate values needed to create Choropleth map
3. Get Choropleth map from GeoPandas and prepare to combine it
4. Get the data frame by frame to the video
5. Combine it all to a video using OpenCV

## Step 1: Get the daily reported COVID-19 data world wide

This data is available from the European Centre for Disease Prevention and Control and can be found here.

All we need is to download the csv file, which has all the historic data from all the reported countries.

This can be done as follows.

```import pandas as pd

# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data

print(table)
```

This will give us an idea of how the data is structured.

```          dateRep  day  month  year  cases  deaths countriesAndTerritories geoId countryterritoryCode  popData2019 continentExp  Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0      01/10/2020    1     10  2020     14       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.040961
1      30/09/2020   30      9  2020     15       2             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.048847
2      29/09/2020   29      9  2020     12       3             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.114565
3      28/09/2020   28      9  2020      0       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.343261
4      27/09/2020   27      9  2020     35       0             Afghanistan    AF                  AFG   38041757.0         Asia                                           1.540413
...           ...  ...    ...   ...    ...     ...                     ...   ...                  ...          ...          ...                                                ...
46221  25/03/2020   25      3  2020      0       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN
46222  24/03/2020   24      3  2020      0       1                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN
46223  23/03/2020   23      3  2020      0       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN
46224  22/03/2020   22      3  2020      1       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN
46225  21/03/2020   21      3  2020      1       0                Zimbabwe    ZW                  ZWE   14645473.0       Africa                                                NaN

[46226 rows x 12 columns]
```

First we want to convert the dateRep to a date object (cannot be seen in the above, but the dates are represented by a string). Then use that as index for easier access later.

```import pandas as pd

# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')
```

## Step 2: Prepare data and compute values needed for plot

What makes sense to plot?

Good question. In a Choropleth map you will color according to a value. Here we will color in darker red the higher the value a country is represented with.

If we plotted based on number new COVID-19 cases, this would be high for countries with high populations. Hence, the number of COVID-19 cases per 100,000 people is used.

Using new COVID-19 cases per 100,000 people can be volatile and change drastic from day to day. To even that out, a 7 days rolling sum can be used. That is, you take the sum of the last 7 days and continue that process through your data.

To make it even less volatile, the average of the last 14 days of the 7 days rolling sum is used.

And no, it is not just something invented by me. It is used by the authorities in my home country to calculate rules of which countries are open for travel or not.

This can by the data above be calculated by computing that data.

```def get_stat(country_code, table):
data = table.loc[table['countryterritoryCode'] == country_code]
data = data.reindex(index=data.index[::-1])
data['7 days sum'] = data['cases'].rolling(7).sum()
data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
data['14 mean'] = data['7ds/100000'].rolling(14).mean()
return data
```

The above function takes the table we returned from Step 1 and extract a country based on a country code. Then it reverses the data to have the dates in chronological order.

After that, it computes the 7 days rolling sum. Then computes the new cases by the population in the country in size of 100,000 people. Finally, it computes the 14 days average (mean) of it.

## Step 3: Get the Choropleth map data and prepare it

GeoPandas is an amazing library to create Choropleth maps. But it does need your attention when you combine it with other data.

Here we want to combine it with the country codes (ISO_A3). If you inspect the data, some of the countries are missing that data.

Other than that the code is straight forward.

```import pandas as pd
import geopandas

# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')

def get_stat(country_code, table):
data = table.loc[table['countryterritoryCode'] == country_code]
data = data.reindex(index=data.index[::-1])
data['7 days sum'] = data['cases'].rolling(7).sum()
data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
data['14 mean'] = data['7ds/100000'].rolling(14).mean()
return data

# Read the data to make a choropleth map
world = world[(world.pop_est > 0) &amp; (world.name != "Antarctica")]

# Store data per country to make it easier
data_by_country = {}

for index, row in world.iterrows():
# The world data is not fully updated with ISO_A3 names
if row['iso_a3'] == '-99':
country = row['name']
if country == "Norway":
world.at[index, 'iso_a3'] = 'NOR'
row['iso_a3'] = "NOR"
elif country == "France":
world.at[index, 'iso_a3'] = 'FRA'
row['iso_a3'] = "FRA"
elif country == 'Kosovo':
world.at[index, 'iso_a3'] = 'XKX'
row['iso_a3'] = "XKX"
elif country == "Somaliland":
world.at[index, 'iso_a3'] = '---'
row['iso_a3'] = "---"
elif country == "N. Cyprus":
world.at[index, 'iso_a3'] = '---'
row['iso_a3'] = "---"

# Add the data for the country
data_by_country[row['iso_a3']] = get_stat(row['iso_a3'], table)
```

This will create a dictionary (data_by_country) with the needed data for each country. Notice, we do it like this, because not all countries have the same number of data points.

## Step 4: Create a Choropleth map for each date and save it as an image

This can be achieved by using matplotlib.

The idea is to go through all dates and look for each country if they have data for that date and use it if they have.

```import pandas as pd
import geopandas
import matplotlib.pyplot as plt

# Just to get more rows, columns and display width
pd.set_option('display.max_rows', 300)
pd.set_option('display.max_columns', 300)
pd.set_option('display.width', 1000)

# Get the updated data

# Convert dateRep to date object
table['date'] = pd.to_datetime(table['dateRep'], format='%d/%m/%Y')
# Use date for index
table = table.set_index('date')

def get_stat(country_code, table):
data = table.loc[table['countryterritoryCode'] == country_code]
data = data.reindex(index=data.index[::-1])
data['7 days sum'] = data['cases'].rolling(7).sum()
data['7ds/100000'] = data['7 days sum'] * 100000 / data['popData2019']
data['14 mean'] = data['7ds/100000'].rolling(14).mean()
return data

# Read the data to make a choropleth map
world = world[(world.pop_est > 0) &amp; (world.name != "Antarctica")]

# Store data per country to make it easier
data_by_country = {}

for index, row in world.iterrows():
# The world data is not fully updated with ISO_A3 names
if row['iso_a3'] == '-99':
country = row['name']
if country == "Norway":
world.at[index, 'iso_a3'] = 'NOR'
row['iso_a3'] = "NOR"
elif country == "France":
world.at[index, 'iso_a3'] = 'FRA'
row['iso_a3'] = "FRA"
elif country == 'Kosovo':
world.at[index, 'iso_a3'] = 'XKX'
row['iso_a3'] = "XKX"
elif country == "Somaliland":
world.at[index, 'iso_a3'] = '---'
row['iso_a3'] = "---"
elif country == "N. Cyprus":
world.at[index, 'iso_a3'] = '---'
row['iso_a3'] = "---"

# Add the data for the country
data_by_country[row['iso_a3']] = get_stat(row['iso_a3'], table)

# Create an image per date
for day in pd.date_range('12-31-2019', '10-01-2020'):
print(day)
world['number'] = 0.0
for index, row in world.iterrows():
if day in data_by_country[row['iso_a3']].index:
world.at[index, 'number'] = data_by_country[row['iso_a3']].loc[day]['14 mean']

world.plot(column='number', legend=True, cmap='OrRd', figsize=(15, 5))
plt.title(day.strftime("%Y-%m-%d"))
plt.savefig(f'image-{day.strftime("%Y-%m-%d")}.png')
plt.close()
```

This will create an image for each day. These images will be combined.

## Step 5: Create a video from images with OpenCV

Using OpenCV to create a video from a sequence of images is quite easy. The only thing you need to ensure is that it reads the images in the correct order.

```import cv2
import glob

img_array = []
filenames = glob.glob('image-*.png')
filenames.sort()
for filename in filenames:
print(filename)
height, width, layers = img.shape
size = (width, height)
img_array.append(img)

out = cv2.VideoWriter('covid.avi', cv2.VideoWriter_fourcc(*'DIVX'), 15, size)

for i in range(len(img_array)):
out.write(img_array[i])
out.release()
```

Where we use the VideoWriter from OpenCV.

This results in this video.

## What will we cover in this tutorial?

We will look at how the Birthday Paradox is used when estimating how collision resistance a hash function is. This tutorial will show that a good estimate is that a n-bit hash function will have collision by chance with n/2-bit random hash values.

## Step 1: Understand a hash function

A hash function is a one-way function with a fixed output size. That is, the output has the same size and it is difficult to find two distinct input chucks, which give the same output.

hash function is any function that can be used to map data of arbitrary size to fixed-size values.

https://en.wikipedia.org/wiki/Hash_function

Probably the best know example of a hash-function is the MD5. It was designed to be used as a cryptographic hash function, but has been found to have many vulnerabilities.

Does this mean you should not use the MD5 hash function?

That depends. If you use it in a cryptographic setup, the answer is Do not use.

On the other hand, hash function are often used to calculate identifiers. For that purpose, it also depends if you should use it or not.

This is where the Birthday Paradox comes in.

## Step 2: How are hash functions and the Birthday Paradox related?

Good question. First recall what the Birthday Paradox states.

…in a random group of 23 people, there is about a 50 percent chance that two people have the same birthday

How can that be related to hash functions? There is something about collisions, right?

Given 23 people, we have 50% chance of collision (two people with the same birthday).

Hence, if we have that our hash functions maps data to a day in the calendar year. That is, it maps hash(data) -> [0, 364], then given 23 hash values, we have 50% chance for collision.

But you also know that our hash function maps to more than 365 distinct values. Actually, the MD5 maps to 2^128 distinct values.

An example would be appreciated now. Let us make a simplified hash function, call it MD5′ (md5-prime), which maps like the MD5, but only uses the first byte of the result.

That is, we have MD5′(data) -> [0, 255].

Surely, by the pigeonhole principle we would run out of possible values after 256 distinct data input to MD5′ and have a collision.

```import hashlib
import os

lookup_table = {}
collision_count = 0
for _ in range(256):
random_binary = os.urandom(16)
result = hashlib.md5(random_binary).digest()
result = result[:1]
if result in lookup_table:
print("Collision")
print(random_binary, result)
print(lookup_table[result], result)
collision_count += 1
else:
lookup_table[result] = random_binary

print("Number of collisions:", collision_count)
```

The lookup_table is used to store the already seen hash values. We will iterate over the 256 (one less than possible values of our MD5′ hash function). Take some random data and hash it with md5 and only use first byte (8 bits). If result already exists in lookup_table we have a collision, otherwise add it to our lookup_table.

For a random run of this I got 87 collisions. Expected? I would say so.

Let us try to use the Birthday Paradox to estimate how many hash values we need to get a collision of our MD5′ hash function.

A rough estimate that is widely used, is that the square root of the number of possible outcomes will give a 50% chance of collision (see wikipedia for approximation).

That is, for MD5′(data) -> [0, 255] it is, sqrt(256) = 16. Let’s try that.

```import hashlib
import os

collision = 0
for _ in range(1000):
lookup_table = {}
for _ in range(16):
random_binary = os.urandom(16)
result = hashlib.md5(random_binary).digest()
result = result[:1]
if result not in lookup_table:
lookup_table[result] = random_binary
else:
collision += 1
break

print("Number of collisions:", collision, "out of", 1000)
```

Which gives some like this.

```Number of collisions: 391 out of 1000
```

That is in the lower end, but still a reasonable approximation.

## Step 3: Use a correct data structure to lookup in

Just to clarify. We will not find collisions on the full MD5 hash function, but we will try to see if the estimate of collision is reasonable.

This requires to do a lot of calculations and we want to ensure that we are not having a bottleneck with using a wrong data structure.

The Python dict should be a hash table with expected insert and lookup O(1). Still the worst case is O(n) for these operations, which would be a big overhead to cary along the way. Hence, we will first test, that the dictionary has O(1) insert and lookup time for the use cases we have of it here.

```import time
import matplotlib.pyplot as plt

def dict_size(size):
start = time.time()
dict = {}
for i in range(size):
if i in dict:
print("HIT")
else:
dict[i] = 0

return time.time() - start

x = []
y = []
for i in range(0, 2**20, 2**12):
performance = dict_size(i)
x.append(i)
y.append(performance)

plt.scatter(x, y, alpha=0.1)
plt.xlabel("Size")
plt.ylabel("Time (sec)")
plt.show()
```

Resulting in something like this.

What does that tell us? That the dict in Python has a approximately linear insert and lookup time, that is O(1). But there some overhead at some sizes, e.g. a bit before 3,000,000. It is not exactly linear, but close enough not to expect a exponential run time.

This step is not necessary, but it is nice to know how the function grows in time, when we want to check for collisions. If the above time complexity grew exponentially (or not linearly), then it can suddenly become hard to estimate the runtime if we run for a bigger space.

## Step 4: Validating if square root of the bit size is a good estimate for collision

We will continue our journey with our modified MD5′ hash function, where the output space will be reduced.

We will then for various output space sizes see if the estimate for 50% collision of the hash functions is decent. That is, if we need approximately sqrt(space_size) of hash values to have an approximately 50% chance of a collision.

This can be done by the following code.

```import hashlib
import os
import time
import matplotlib.pyplot as plt

def main(bit_range):
start = time.time()
collision_count = 0
# Each space_size counts for 4 bits, hence we have
space_size = bit_range//4
for _ in range(100):
lookup_table = {}
# Searching half the sqrt of the space for collision
# sqrt(2**bit_range) = 2**(bit_range//2)
for _ in range(2**(bit_range//2)):
random_binary = os.urandom(16)
result = hashlib.md5(random_binary).hexdigest()
result = result[:space_size]
if result in lookup_table:
collision_count += 1
break
else:
lookup_table[result] = random_binary

return time.time() - start, collision_count

x = []
y1 = []
y2 = []
for i in range(4, 44, 4):
performance, count = main(i)
x.append(i)
y1.append(performance)
y2.append(count)

_, ax1 = plt.subplots()
plt.xlabel("Size")
plt.ylabel("Time (sec)")
ax1.scatter(x, y1)
ax2 = ax1.twinx()
ax2.bar(x, y2, align='center', alpha=0.5, color='red')
ax2.set_ylabel("Collision rate (%)", color='red')
ax2.set_ylim([0, 100])

plt.show()
```

The estimated collision rate is very rough, as it only runs 100 trials for each space size.

The result are shown in the graph below.

Interestingly, it seems to be in the 30-50% range for most cases.

As a note, it might confuse that the run-time (the dots), does not seem to be linear. That is because for each bit-size we increase, we double the space. Hence, the x-axis is a logarithmic scale.

## Step 5: What does that all mean?

This has high impact on using hash functions for creating unique identifiers. If you want a short identifier with the least number of bits, then you need to consider the Birthday Paradox.

Assume you created the following service.

```import hashlib
import base64

def get_uid(text):
result = hashlib.md5(text.encode()).digest()
result = base64.b64encode(result)
return result[:4]

uid = get_uid("my text")
print(uid)
```

If the input text can be considered random, how resistant is get_uid(…) function against collision.

Well, it returns 4 base64 characters. That is 6*4 = 24 bits of information (each base 64 character contains 6 bits of information). The rough estimate is that if you use it sprt(2^24) = 2^12 = 4,096 times you will have a high risk of collision (approximately 50% chance).

Let’s try.

```import hashlib
import os
import base64

def get_uid(text):
result = hashlib.md5(text).digest()
result = base64.b64encode(result)
return result[:4]

lookup_table = {}
for _ in range(4096):
text = os.urandom(16)
uid = get_uid(text)
if uid in lookup_table:
print("Collision detected")
else:
lookup_table[uid] = text
```

It does not give collision every time, but run it a few times and you will get.

```Collision detected
```

Hence, it seems to be valid. The above code was run 1000 times and gave collision 497 times, which is close to 50% of the time.

The Birthday Paradox is presented as follows.

…in a random group of 23 people, there is about a 50 percent chance that two people have the same birthday

This is also referred to as the Birthday Problem in probability theory.

First question: What is a paradox?

…is a logically self-contradictory statement or a statement that runs contrary to one’s expectation

Wikipedia

What does that mean? A logically self-contradictory statement‚ means that there should be a contradiction somewhere in the Birthday Paradox. This is not the case.

Then a statement that runs contrary to one’s expectations, could be open for discussion. As we will see, by example, in this post, it is not contrary to one’s expectation for an informed person.

## Step 1: Run some examples

The assumption is that we have 23 random people. This assumes further, that the birthday of each one of these people is random.

To validate that this is true, let’s try to implement it in Python.

```import random

stat = {'Collision': 0, 'No-collision': 0}

for _ in range(10000):
days = []
for _ in range(23):
day = random.randint(0, 365)
days.append(day)

if len(days) == len(set(days)):
stat['No-collision'] += 1
else:
stat['Collision'] += 1

print("Probability for at least 2 with same birthday in a group of 23")
print("P(A) =", stat['Collision']/(stat['Collision'] + stat['No-collision']))
```

This will output different results from run to run, but something around 0.507.

```Probability for at least 2 with same birthday in a group of 23
P(A) = 0.5026
```

A few comments to the code. It keeps record of how many times of choosing 23 random birthdays, we will end with at least two of them being the same day. We run the experiment 10,000 times to have some idea if it is just pure luck.

The check if len(days) == len(set(days)) tests whether we did not have the same brirthday. If function set(…) takes all the unique days in the list. Hence, if we have two the same days days of the year, then the len (length) will be the same for the list and the set of days.

## Step 2: The probability theory behind it

This is where it becomes a bit more technical. The above shows it behaves like it says. That if we take a group of 23 random people, with probability 50%, two of them will have the same birthday.

Is this contrary to one’s expectations? Hence, is it a paradox?

Before we answer that, let’s see if we can nail the probability theory behind this.

Do it step by step.

If we have 1 person, what is the probability that anyone in this group of 1 person has the same birthday? Yes, it sounds strange. The probability is obviously 0.

If we have 2 persons, what is the probability that any of the 2 people have the same birthday? Then they need to have the same birthday. Hence, the probability become 1/365.

How do you write that as an equation?

What we often do in probability theory, is, that we calculate the opposite probability.

Hence, we calculate the probability of now having two the same birthdays in a group. This is easier to calculate. In the first case, we have all possibilities open.

P(1) = 1

Where P(1) is the probability that given a group of one person, what is the probability of that person not having the same birthday as anyone in the group.

P(2) = 1 x (364 / 365)

Because, the first birthday is open for any birthday, then the second, only has 364 left of 365 possible birthdays.

This continues.

P(n) = 1 x (364 / 365) x (363 / 365) x … x ((365 – n + 1) / 365)

Which makes the probability of picking 23 random people without anyone with the same birthday to be.

P(23) = 1 x (364 / 365) x (363 / 365) x … x (343 / 365) = 0.493

Or calculated in Python.

```def prop(n):
if n == 1:
return 1
else:
return (365 - n + 1) / 365 * prop(n - 1)

print("Probability for at no-one with same birthday in a group of 23")
print("P(A') =",  prop(23))
```

Which results in.

```Probability for at no-one with same birthday in a group of 23
P(A') = 0.4927027656760144
```

This formula can be rewritten (see wikipedia), but for our purpose the above is fine for our purpose.

The probability we look for is given by.

P(A) = 1 – P(A’)

## Step 3: Make a graph of how likely a collision is based on a group size

This is great news. We can now calculate the theoretical probability of two people having the same birthday in a group of n random people.

This can be achieved by the following code.

```from matplotlib import pyplot as plt

def prop(n):
if n == 1:
return 1
else:
return (365 - n + 1) / 365 * prop(n - 1)

X = []
Y = []
for i in range(1, 90):
X.append(i)
Y.append(1 - prop(i))

plt.scatter(X, Y, color='blue')
plt.xlabel("Number of people")
plt.ylabel("Probability of collision")
plt.axis([0, 90, 0, 1])
plt.show()
```

Which results in the following plot.

Where you can see that about 23 people, we have 50% chance of having a pair with the same birthday (called collision).

## Conclusion

Is it a paradox? Well, there is no magic in it. You can see the above are just simple calculations. But is the following contrary to one’s expectation?

6 weeks are 6*7*24*60*60 seconds = 3,628,800 seconds.

And 10! = 10*9*8*7*6*5*4*3*2*1 = 3,628,800.

Well, the first time you calculate it might be. But does that make it a paradox?

No, it is just a surprising fact the first time you see it. Does it mean that seconds are related to faculty? No, of course not. It is just one strange thing that connects in a random way.

The same with the Birthday Paradox, it is just surprising the first time you see it.

It seems surprising for people that you only need 23 people to have 50% chance of a pair with the same birthday, but it is not a paradox for people that work with numbers.