Automate a Quotation Image for Twitter in Python in 5 Easy Steps

The challenge

You have a quote and an image.

    quote = "Mostly, when you see programmers, they aren’t doing anything.  One of the attractive things about programmers is that you cannot tell whether or not they are working simply by looking at them.  Very often they’re sitting there seemingly drinking coffee and gossiping, or just staring into space.  What the programmer is trying to do is get a handle on all the individual and unrelated ideas that are scampering around in his head."
    quote_by = "Charles M. Strauss"

len(quote) = 430

The quote is long (430 chars) and the picture might be too bright to write in white text. Also, it will take time to find the right font size.

Can you automate that getting a result that fits the Twitter recommended picture size?

Notice the following.

  • The picture is dimmed to make the white text easy readable.
  • The font size is automatically adjusted to fit the picture.
  • The picture is cropped to fit recommended Twitter size (1024 x 512).
  • There is added a logo by your choice.

If this is what you are looking for, then this will automate that.

Step 1: Find your background picture and quote

In this tutorial I will use the background image and quote given above. But you can change that to your needs.

If you chose a shorter quote the font size will adjust accordingly.

Example given here.

Notice that the following.

  • The font size of the “quoted by” (here Learn Python With Rune) is adjusted to not fill more that half of the picture width.
  • You can modify the margins as you like – say you want more space between the text and the logo.

Step 2: Install the PILLOW library

The what? Install the pillow library. You can find installation documentation in their docs.

Or just type

pip install pillow

The pillow library is the PIL fork, and PIL is the Python Imaging Library. Basically, you need it for processing images.

Step 3: Download a font you want to use

You can find various fonts in font.google.com.

For this purpose, I used the Balsamiq Sans font.

I have located the bold and the italic version in a font folder.

Step 4: The actual Python code

This is the fun part. The actual code.

from PIL import Image, ImageDraw, ImageFont, ImageEnhance

# picture setup - it is set up for Twitter recommendations
WIDTH = 1024
HEIGHT = 512
# the margin are set by my preferences
MARGIN = 50
MARGIN_TOP = 50
MARGIN_BOTTOM = 150
LOGO_MARGIN = 25

# font variables
FONT_SIZES = [110, 100, 90, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]
FONT_QUOTE = 'font-text'
FONT_QUOTED_BY = 'font-quoted-by'
FONT_SIZE = 'font-size'
FONT_QUOTED_BY_SIZE = 'font-quoted-by-size'

# Font colors
WHITE = 'rgb(255, 255, 255)'
GREY = 'rgb(200, 200, 200)'

# output text
OUTPUT_QUOTE = 'quote'
OUTPUT_QUOTED_BY = 'quoted-by'
OUTPUT_LINES = 'lines'


def text_wrap_and_font_size(output, font_style, max_width, max_height):
    for font_size in FONT_SIZES:
        output[OUTPUT_LINES] = []
        font = ImageFont.truetype(font_style[FONT_QUOTE], size=font_size, encoding="unic")
        output[OUTPUT_QUOTE] = " ".join(output[OUTPUT_QUOTE].split())
        if font.getsize(output[OUTPUT_QUOTE])[0] <= max_width:
            output[OUTPUT_LINES].append(output[OUTPUT_QUOTE])
        else:
            words = output[OUTPUT_QUOTE].split()
            line = ""
            for word in words:
                if font.getsize(line + " " + word)[0] <= max_width:
                    line += " " + word
                else:
                    output[OUTPUT_LINES].append(line)
                    line = word
            output[OUTPUT_LINES].append(line)
        line_height = font.getsize('lp')[1]

        quoted_by_font_size = font_size
        quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=quoted_by_font_size, encoding="unic")
        while quoted_by_font.getsize(output[OUTPUT_QUOTED_BY])[0] > max_width//2:
            quoted_by_font_size -= 1
            quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=quoted_by_font_size, encoding="unic")

        if line_height*len(output[OUTPUT_LINES]) + quoted_by_font.getsize('lp')[1] < max_height:
            font_style[FONT_SIZE] = font_size
            font_style[FONT_QUOTED_BY_SIZE] = quoted_by_font_size
            return True

    # we didn't succeed find a font size that would match within the block of text
    return False


def draw_text(image, output, font_style):

    draw = ImageDraw.Draw(image)
    lines = output[OUTPUT_LINES]
    font = ImageFont.truetype(font_style[FONT_QUOTE], size=font_style[FONT_SIZE], encoding="unic")
    line_height = font.getsize('lp')[1]

    y = MARGIN_TOP
    for line in lines:
        x = (WIDTH - font.getsize(line)[0]) // 2
        draw.text((x, y), line, fill=WHITE, font=font)

        y = y + line_height

    quoted_by = output[OUTPUT_QUOTED_BY]
    quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=font_style[FONT_QUOTED_BY_SIZE], encoding="unic")
    # position the quoted_by in the far right, but within margin
    x = WIDTH - quoted_by_font.getsize(quoted_by)[0] - MARGIN
    draw.text((x, y), quoted_by, fill=GREY, font=quoted_by_font)
    return image


def generate_image_with_quote(input_image, quote, quote_by, font_style, output_image):
    image = Image.open(input_image)

    # darken the image to make output more visible
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(0.5)

    # resize the image to fit Twitter
    image = image.resize((WIDTH, HEIGHT))

    # set logo on image
    logo_im = Image.open("pics/logo.png")
    l_width, l_height = logo_im.size
    image.paste(logo_im, (WIDTH - l_width - LOGO_MARGIN, HEIGHT - l_height - LOGO_MARGIN), logo_im)

    output = {OUTPUT_QUOTE: quote, OUTPUT_QUOTED_BY: quote_by}

    # we should check if it returns true, but it is ignorred here
    text_wrap_and_font_size(output, font_style, WIDTH - 2*MARGIN, HEIGHT - MARGIN_TOP - MARGIN_BOTTOM)

    # now it is time to draw the quote on our image and save it
    image = draw_text(image, output, font_style)
    image.save(output_image)


def main():
    # setup input and output image
    input_image = "pics/background.jpg"
    output_image = "quote_of_the_day.png"

    # setup font type
    font_style = {FONT_QUOTE: "font/BalsamiqSans-Bold.ttf", FONT_QUOTED_BY: "font/BalsamiqSans-Italic.ttf"}

    quote = "Mostly, when you see programmers, they aren’t doing anything.  One of the attractive things about programmers is that you cannot tell whether or not they are working simply by looking at them.  Very often they’re sitting there seemingly drinking coffee and gossiping, or just staring into space.  What the programmer is trying to do is get a handle on all the individual and unrelated ideas that are scampering around in his head."
    # quote = "YOU ARE AWESOME!"
    quote_by = "Charles M. Strauss"
    # quote_by = "Learn Python With Rune"

    # generates the quote image
    generate_image_with_quote(input_image, quote, quote_by, font_style, output_image)


if __name__ == "__main__":
    main()

Notice that you can change the input and output image names and locations in the main() function. Also, there you can setup the font for the quote and the quoted-by.

Finally, and obviously, you can change the quote and quote-by in the main() function.

https://www.learnpythonwithrune.org/beginnerpython/

Learn How to Add Text to an Image in Python – 4 Easy Steps

Step 1: Install the Pillow library

We need the Pillow library in order to manipulate images. It can be installed by using pip.

pip install Pillow

This enables us to use PIL (Python Imaging Library) library.

Step 2: Download a font to use.

You can browse fonts free to download from Google Font Library. In this example I use the Balsamic Sans.

Place the download in the same folder as you want to work from.

Step 3: Find an awesome picture

I use Pexels to find free awesome pictures to use.

And place that image in your same location.

Step 4: Create you Python program to add your awesome text

from PIL import Image, ImageDraw, ImageFont

# Open the image (change name and location if needed)
image = Image.open('pics/background.jpg')
draw = ImageDraw.Draw(image)

# Import the font(change name and location if needed, also change font size if needed)
font = ImageFont.truetype('font/BalsamiqSans-Bold.ttf', size=200)
black = 'rgb(0, 0, 0)'  # black color

# Draw the text - change position if needed
message = "This is great!"
draw.text((1000, 200), message, fill=black, font=font)

# Draw the text - change position if needed
name = 'You are AWESOME!'
draw.text((1100, 500), name, fill=black, font=font)

image.save('greeting_card.png')

What is a Substitution Cipher and is it Secure?

What will you learn?

  • What is a Substitution Cipher?
  • How to implement the Substitution Cipher in Python
  • Understand the key space of Substitution Cipher
  • The weakness and how to break the Substitution Cipher

What is a Substitution Cipher

Imagine you receive this message: IQL WPV WCVJQEV

What does it mean?

You were told it is a Substitution Cipher, but how will that help you?

First of all, we need to understand what a Substitution Cipher is. Basically, it is just rearranging the characters. That is every time you write an A, you exchange that with, say, Q. And B with G. C with, hey, let’s keep the C.

See the mapping in the picture below.

Substitution Cipher Example
Substitution Cipher Example

That seems pretty solid. Right?

But can we figure out what your message means?

First, let’s try to implement a Substitution Cipher.

Implementing Substitution Cipher in Python

We will use the random library to generate random keys. We’ll get back to how many keys are there.

import random


def generate_key():
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    chars = list(alphabet)
    key = {}
    for c in alphabet:
        key[c] = chars.pop(random.randint(0, len(chars) - 1))
    return key


def print_key(key):
    for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
        print(c, key[c])


def encrypt(key, message):
    cipher = ""
    for c in message:
        if c in key:
            cipher += key[c]
        else:
            cipher += " "
    return cipher


def get_decrypt_key(key):
    dkey = {}
    for k in key:
        dkey[key[k]] = k
    return dkey


key = generate_key()
print(key)
cipher = encrypt(key, "YOU ARE AWESOME")
print(cipher)
dkey = get_decrypt_key(key)
message = encrypt(dkey, cipher)
print(message)

Well, let’s understand the key-space. How secure is the Substitution Cipher?

The key-space of the Substitution Cipher

Let’s examine how the keys are generated.

Substitution Cipher with key-space calculation
Substitution Cipher with key-space calculation

The first letter, A, can be mapped to 26 possibilities (including A itself). The next letter, B, can be mapped to 25 possibilities. The third letter, C, can be mapped to (difficult to guess) 24 possibilities. And so forth.

This generates a space of (read the large number in the picture above). That is right. It is more than 88 bits of security.

That is a lot.

88 bits of security? That is you need to try (more than) 309485009821345068724781056 possibilities.

A whole lot. And that, I promise you, should be considered secure.

But wait, Substitution Cipher is not used any more, why? Read on, and you will know in a moment.

The weakness of Substitution Cipher

If the underlying language is English, then you can make a simple frequency analysis of how often the letters occur on average in English.

It turns out to be quite simple.

letter_freq = {'a': 0.0817, 'b': 0.0150, 'c': 0.0278, 'd': 0.0425, 'e': 0.1270, 'f': 0.0223,
               'g': 0.0202, 'h': 0.0609, 'i': 0.0697, 'j': 0.0015, 'k': 0.0077, 'l': 0.0403,
               'm': 0.0241, 'n': 0.0675, 'o': 0.0751, 'p': 0.0193, 'q': 0.0010, 'r': 0.0599,
               's': 0.0633, 't': 0.0906, 'u': 0.0276, 'v': 0.0098, 'w': 0.0236, 'x': 0.0015,
               'y': 0.0197, 'z': 0.0007}

That is, the letter ‘a’ occurs 8.17% percent probability. ‘b’ with 1.5%. ‘c’ with 2.78%.

Hence, given the text: IQL WPV WCVJQEV

Well, we are out of luck, because it is too short to have any frequency analysis to have any significance.

But the following text.

lrvmnir bpr sumvbwvr jx bpr lmiwv yjeryrkbi jx qmbm wi
bpr xjvni mkd ymibrut jx irhx wi bpr riirkvr jx
ymbinlmtmipw utn qmumbr dj w ipmhh but bj rhnvwdmbr bpr
yjeryrkbi jx bpr qmbm mvvjudwko bj yt wkbrusurbmbwjk
lmird jk xjubt trmui jx ibndt
  wb wi kjb mk rmit bmiq bj rashmwk rmvp yjeryrkb mkd wbi
iwokwxwvmkvr mkd ijyr ynib urymwk nkrashmwkrd bj ower m
vjyshrbr rashmkmbwjk jkr cjnhd pmer bj lr fnmhwxwrd mkd
wkiswurd bj invp mk rabrkb bpmb pr vjnhd urmvp bpr ibmbr
jx rkhwopbrkrd ywkd vmsmlhr jx urvjokwgwko ijnkdhrii
ijnkd mkd ipmsrhrii ipmsr w dj kjb drry ytirhx bpr xwkmh
mnbpjuwbt lnb yt rasruwrkvr cwbp qmbm pmi hrxb kj djnlb
bpmb bpr xjhhjcwko wi bpr sujsru msshwvmbwjk mkd
wkbrusurbmbwjk w jxxru yt bprjuwri wk bpr pjsr bpmb bpr
riirkvr jx jqwkmcmk qmumbr cwhh urymwk wkbmvb

This has enough letters to make an analysis of the letters. Let’s try.

cipher = """lrvmnir bpr sumvbwvr jx bpr lmiwv yjeryrkbi jx qmbm wi
bpr xjvni mkd ymibrut jx irhx wi bpr riirkvr jx
ymbinlmtmipw utn qmumbr dj w ipmhh but bj rhnvwdmbr bpr
yjeryrkbi jx bpr qmbm mvvjudwko bj yt wkbrusurbmbwjk
lmird jk xjubt trmui jx ibndt
  wb wi kjb mk rmit bmiq bj rashmwk rmvp yjeryrkb mkd wbi
iwokwxwvmkvr mkd ijyr ynib urymwk nkrashmwkrd bj ower m
vjyshrbr rashmkmbwjk jkr cjnhd pmer bj lr fnmhwxwrd mkd
wkiswurd bj invp mk rabrkb bpmb pr vjnhd urmvp bpr ibmbr
jx rkhwopbrkrd ywkd vmsmlhr jx urvjokwgwko ijnkdhrii
ijnkd mkd ipmsrhrii ipmsr w dj kjb drry ytirhx bpr xwkmh
mnbpjuwbt lnb yt rasruwrkvr cwbp qmbm pmi hrxb kj djnlb
bpmb bpr xjhhjcwko wi bpr sujsru msshwvmbwjk mkd
wkbrusurbmbwjk w jxxru yt bprjuwri wk bpr pjsr bpmb bpr
riirkvr jx jqwkmcmk qmumbr cwhh urymwk wkbmvb"""

alphabet = "abcdefghijklmnopqrstuvwxyz"
freq = {}
for c in alphabet:
    freq[c] = 0

cnt = 0
for c in cipher:
    if c in alphabet:
        freq[c] += 1
        cnt += 1

for c in freq:
    freq[c] = round(freq[c]/cnt, 4)

print(freq)

Will give you the following output.

{'a': 0.0077, 'b': 0.1053, 'c': 0.0077, 'd': 0.0356, 'e': 0.0077, 'f': 0.0015, 'g': 0.0015, 'h': 0.0356, 'i': 0.0635, 'j': 0.0743, 'k': 0.0759, 'l': 0.0124, 'm': 0.096, 'n': 0.0263, 'o': 0.0108, 'p': 0.0464, 'q': 0.0108, 'r': 0.13, 's': 0.0263, 't': 0.0201, 'u': 0.0372, 'v': 0.0341, 'w': 0.0728, 'x': 0.031, 'y': 0.0294, 'z': 0.0}

This gives you some hints on how the letters are mapped.

Bubble Sort Explained, Implemented in Python and Time Complexity Analysis

What will we cover?

  • What is Bubble sort and how does it work?
  • How to implement it in Python
  • How to evaluate the time complexity of the Bubble sort algorithm

What is Bubble sort and how does it work?

In this video the Bubble sort algorithm is explained. On a high level, it takes an unsorted list and returns it sorted.

How to implement Bubble sort in Python

Well, notice the above description. It is straight forward and simple to implement. That is one of the beauties of the algorithm. Simple to understand. Simple to implement.

def bubble_sort(my_list):
    for i in range(len(my_list), 1, -1):
        for j in range(1, i):
            if my_list[j-1] > my_list[j]:
                my_list[j-1], my_list[j] = my_list[j], my_list[j-1]

An simple illustration of how you can use it.

import random


def generate_random_list(n):
    my_list = []
    for i in range(n):
        my_list.append(random.randint(0, n))
    return my_list


def bubble_sort(my_list):
    for i in range(len(my_list), 1, -1):
        for j in range(1, i):
            if my_list[j-1] > my_list[j]:
                my_list[j-1], my_list[j] = my_list[j], my_list[j-1]



my_list = generate_random_list(20)
print(my_list)
bubble_sort(my_list)
print(my_list)

Evaluating the performance of Bubble sort

First of the algorithm has two for-loops. An outer and an inner for-loop. With an input of a list of length N, the loops will be iterated the following number of times.

  • Outer for-loop: N times
  • Inner for-loop: N-1 + N-2 + N-3 + … + 1 + 0 times

Knowing the formula

  • N-1 + N-2 + … + 1 + 0 = (N-1)*N/2

We can add that the time complexity of Bubble sort is O(n^2).

But let’s try to experiment with real running data, to see if we can confirm that complexity.

To get run-times from the algorithm the cProfile library comes in handy. It is easy to use and gives good insights. A simple way to get run-times is to set it like this.

import random
import cProfile


def generate_random_list(n):
    return [random.randint(0, 4*n) for i in range(n)]


def bubble_sort(my_list):
    for i in range(len(my_list), 1, -1):
        for j in range(1, i):
            if my_list[j-1] > my_list[j]:
                my_list[j-1], my_list[j] = my_list[j], my_list[j-1]


def profile_bubble_sort(n):
    my_list = generate_random_list(n)
    bubble_sort(my_list)


cProfile.run("profile_bubble_sort(10000)")

It will result in an output in the following manner.

         56372 function calls in 11.056 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   11.056   11.056 <string>:1(<module>)
        1    0.000    0.000   11.055   11.055 BubbleSortProfiling.py:16(profile_bubble_sort)
        1    0.000    0.000    0.034    0.034 BubbleSortProfiling.py:5(generate_random_list)
        1    0.008    0.008    0.034    0.034 BubbleSortProfiling.py:6(<listcomp>)
        1   11.021   11.021   11.021   11.021 BubbleSortProfiling.py:9(bubble_sort)
    10000    0.010    0.000    0.022    0.000 random.py:200(randrange)
    10000    0.005    0.000    0.027    0.000 random.py:244(randint)
    10000    0.007    0.000    0.012    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000   11.056   11.056 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
    10000    0.001    0.000    0.001    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    16364    0.003    0.000    0.003    0.000 {method 'getrandbits' of '_random.Random' objects}

You get the time spend in Bubble sort by looking the highlighted line. In the column cumtime you get the time spend in total in the function.

By collecting the run-time for various sizes of lists, we get the following graph.

The graph has a O(n^2) growth as expected.

How to Implement a Stack in Python and Check the Run-time Performance

We will cover the following in this article

  • What is a Stack – a short introduction
  • How to implement a Stack in Python
  • Investigate the run-time performance

What is a Stack

A Stack is a useful concept that is used in daily life, and hence, a concept that is important to understand and master as a programmer.

To understand Stacks just think of a stack of plates. There are two main operations you can do. First, you can add a plate on top of the stack. That operation is called push adds the element on top of the stack. Second, you can remove the top plate of the stack. That operation is called pop, and returns the top element of the stack.

In the diagram below a Stack is pictured. It contains of a Stack of element on the left side. The operation push of the element 03 is executed and results is pictured on the right side. Notice, that the push operation puts the element on top of the stack.

Below the operation pop is executed. Notice, that the pop operation takes from the top of the stack.

Implementation of a Stack in Python

It is a good idea to have a helper class Node that represents the elements on the stack.

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node

The actual functionality of the Stack is kept in a Stack class.

class Stack:
    def __init__(self):
        self.stack = None

    def push(self, element):
        self.stack = Node(element, self.stack)

    def pop(self):
        element = self.stack.element
        self.stack = self.stack.next_node
        return element

    def is_empty(self):
        return self.stack is None

Now you can use your stack. Like the example below.

s = Stack()
for i in range(5):
    s.push(i)
while not s.is_empty():
    print(s.pop())

Will give the following output.

4
3
2
1
0

Notice the order of the element being removed from the stack by pop.

Run-time Performance

If we look at how the stack perform in order of the data size. To investigate the run-time performance the cProfile library is a good choice and simple to use. The following piece of code will help you investigate the performance.

import cProfile

def profile_stack(n):
    s = Stack()
    for i in range(n):
        s.push(i)
    while not s.is_empty():
        s.pop()


cProfile.run("profile_stack(100000)")

See the following graph.

As you see, the push and pop operations are constant, O(1), resulting in a linear performance of n push and pop operations as in the above experiment.

Comparing Performance of Python list as a Stack – How a wrong implementation can ruin performance

A Stack?

A Stack is using the principle first-in-last-out.

It is like a stack of plates. The last one you put on the top is the first one you take.

How can you implement them in Python? Well, we are in luck, you can use a Stack, and if done correctly, you will have the same performance as an actual Stack implementation will have.

But first, how can you do it wrong?

Well, you might think that the first element of the list is the top of your stack, hence in you will insert the elements on the first position, and, hence, remove them from the first position as well.

# Create a list as a stack
s = []

# Insert into the first position.
element = 7
s.insert(0, element)

# Remove from the first position.
s.pop(0)

Sounds about right?

Let’s test that and compare it with a different approach. To add the newest element to the end of the list, and, hence, remove them from the end of the list.

# Create a list and use it as stack
s = []

# Insert element in last postion
element = 7
s.append(element)

# Remove from the last position
s.pop()

Let’s check the performance of those two approaches.

Comparing the performance of the two approaches

How do you compare. You can use cProfile library. It is easy to use and informative results

See the sample code below, which compares the two approaches by create a stack each and inserting n elements to it and removing them afterwards.

import cProfile


def profile_list_as_queue_wrong(n):
    s = []
    for i in range(n):
        s.insert(0, i)
    while len(s) > 0:
        s.pop(0)


def profile_list_as_queue_correct(n):
    s = []
    for i in range(n):
        s.append(i)
    while len(s) > 0:
        s.pop()


def profile(n):
    profile_list_as_queue_wrong(n)
    profile_list_as_queue_correct(n)


cProfile.run("profile(100000)")

The results are given here.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.842    5.842 <string>:1(<module>)
        1    0.078    0.078    0.107    0.107 Stack.py:12(profile_list_as_queue_correct)
        1    0.000    0.000    5.842    5.842 Stack.py:20(profile)
        1    0.225    0.225    5.735    5.735 Stack.py:4(profile_list_as_queue_wrong)
   200002    0.017    0.000    0.017    0.000 {len}
   100000    0.007    0.000    0.007    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    3.547    0.000    3.547    0.000 {method 'insert' of 'list' objects}
   200000    1.954    0.000    1.954    0.000 {method 'pop' of 'list' objects}
        2    0.014    0.007    0.014    0.007 {range}

Observe that the “wrong” implementation takes over 5 seconds and the “correct” takes approximately 0.1 second. Over a factor 50 in difference.

Looking into the details

If we look at the complexities given by Python, it explains it all.

The Python lists amortised complexities are given on this page.

And you notice that the append and pop (last element) are O(1), which means constant time. Constant time, means that the operations are independent on the size of the lists. That means the correct implementation gives O(n) time complexity.

On the other hand, the insert and pop(0) have linear performance. That basically means that we with the wrong implementation end up with O(n^2) time complexity.

How to Implement a Queue in Python and Compare Performance with a Python list

What will we cover in this article

  • What is a Queue?
  • Implement a Queue in Python
  • Make performance testing of it
  • Compare it with performance of a Python list

What is a Queue?

We all know what a queue is. You go to the grocery store and get spinach, strawberry and bananas for your shake. Then you see a long line of people in front of the register. That line is a queue.

The same holds in programming. You create queues to process data or input of any kind.

How to implement a Queue in Python

It is easier than you think.

First you create a Node class to represent each node in a queue. A node is an abstraction to represent a point to the next node and the actual element.

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


Then you create the class for the Queue.

class Queue:
    def __init__(self):
        self.head = None
        self.tail = None

    def enqueue(self, element):
        if self.head is None:
            self.head = self.tail = Node(element)
        else:
            n = Node(element, self.tail)
            self.tail.next_node = n
            self.tail = n

    def dequeue(self):
        element = self.head.element
        if self.tail == self.head:
            self.tail = self.head = None
        else:
            self.head = self.head.next_node
        return element

    def is_empty(self):
        return self.head is None

How does it work. Let’s make a simple example.

q = Queue()
for i in range(10):
    q.enqueue(i)

while not q.is_empty():
    print(q.dequeue())

Which will output.

0
1
2
3
4
5
6
7
8
9

Yes! You guessed it.

How do we test performance?

I like to use the cProfile library. It is easy to use and gives informative results.

So how do you test performance? You simply import the cProfile library and use the cProfile.run(…) call.

You also need to do some operations to see how your Queue performs. See the code as an example.

import cProfile


def profile_queue(n):
    q = Queue()
    for i in range(n):
        q.enqueue(i)
    while not q.is_empty():
        q.dequeue()


def profile(n):
    profile_queue(n)


cProfile.run("profile(100000)")

Which will result in the following output.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.819    0.819 <string>:1(<module>)
   100000    0.310    0.000    0.351    0.000 Queue.py:11(enqueue)
   100000    0.308    0.000    0.308    0.000 Queue.py:19(dequeue)
   100000    0.041    0.000    0.041    0.000 Queue.py:2(__init__)
   100001    0.021    0.000    0.021    0.000 Queue.py:27(is_empty)
        1    0.132    0.132    0.819    0.819 Queue.py:34(profile_queue)
        1    0.000    0.000    0.819    0.819 Queue.py:42(profile)
        1    0.000    0.000    0.000    0.000 Queue.py:7(__init__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.008    0.008    0.008    0.008 {range}

The interesting line is line 9, which tells us how much time is spend in the call to profile_queue.

But is the result good?

We need to compare it to other implementations.

Performance testing the Queue with a Python list

Python lists are used for anything. Can we use a Python list as a Queue. Of course. Let’s try to implement that and compare it to our Queue.

import cProfile


def profile_queue(n):
    q = Queue()
    for i in range(n):
        q.enqueue(i)
    while not q.is_empty():
        q.dequeue()


def profile_list_as_queue(n):
    q = []
    for i in range(n):
        q.insert(0,i)
    while len(q) > 0:
        q.pop()


def profile(n):
    profile_queue(n)
    profile_list_as_queue(n)


cProfile.run("profile(100000)")

How does that compare? Let’s see.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.680    3.680 <string>:1(<module>)
   100000    0.295    0.000    0.331    0.000 Queue.py:11(enqueue)
   100000    0.298    0.000    0.298    0.000 Queue.py:19(dequeue)
   100000    0.036    0.000    0.036    0.000 Queue.py:2(__init__)
   100001    0.019    0.000    0.019    0.000 Queue.py:27(is_empty)
        1    0.104    0.104    0.756    0.756 Queue.py:34(profile_queue)
        1    0.101    0.101    2.924    2.924 Queue.py:42(profile_list_as_queue)
        1    0.000    0.000    3.680    3.680 Queue.py:50(profile)
        1    0.000    0.000    0.000    0.000 Queue.py:7(__init__)
   100001    0.005    0.000    0.005    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    2.805    0.000    2.805    0.000 {method 'insert' of 'list' objects}
   100000    0.012    0.000    0.012    0.000 {method 'pop' of 'list' objects}
        2    0.004    0.002    0.004    0.002 {range}

Wow. Our Queue is way faster than the Python list.

But how is it comparing in general?

Comparing the performance of the Queue and a Python list as a Queue.

While it is difficult to see, the performance of the Queue is O(n) (linear) while the performance of the Python list as a Queue is O(n^2).

Hence, the Queue will outperform the Python list for this use case.

Explaining and Solving the Balancing Bracket Problem and Analysing the Performance and Run-time

We are going to answer the following questions.

  • What is the Balancing Bracket Problem?
  • Why is the Balancing Bracket Problem interesting?
  • How a Stack can help you solve the Balancing Bracket Problem efficiently?
  • What is the time complexity and do our implantation have that performance?

What is the Balancing Bracket Problem and how do you solve it?

See the video below to get a good introduction to the problem.

How to solve the problem in Python

You need a stack. You could use a Python list as a stack, while the append and pop last element in the Python list are amortised O(1) time, it is not guaranteed to get the performance you need.

Implementing your own stack will give the the worst case O(1) time complexity. So let us begin by implementing a Stack in Python.

It is more simple than you think.

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


class Stack:
    def __init__(self):
        self.stack = None

    def push(self, element):
        self.stack = Node(element, self.stack)

    def pop(self):
        self.stack = self.stack.next_node

    def is_empty(self):
        return self.stack is None

If you want to read more about stacks also check out this post.

Then given that stack solving the Balancing Bracket Problems becomes easy.

def balancing_bracket(s):
    stack = Stack()
    for c in s:
        if c in "([{":
            stack.push(c)
        elif c in ")]}":
            if stack.is_empty():
                return False
            e = stack.pop()
            if e == "(" and c == ")":
                continue
            elif e == "[" and c == "]":
                continue
            elif e == "{" and c == "}":
                continue
            else:
                return False
        else:
            continue
    if not stack.is_empty():
        return False
    else:
        return True

Time complexity analysis of our solution

Well, the idea with the solution is that it should be O(n), that is, linear in complexity. That means, that a problem of double size should take double time to solve.

The naive solution takes O(n^2), which means a problem of double size takes 4 times longer time.

But let us try to investigate the time performance of our solution. A good tool for that is the cProfile library provided by Python.

But first we need to be able to create random data. Also notice, that the random data we create should be balancing brackets to have worst case performance on our implementation.

To generate random balancing brackets string you can use the following code.

import random

def create_balanced_string(n):
    map_brackets = {"(": ")", "[": "]", "{": "}"}
    s = Stack()
    result = ""
    while n > 0 and n > s.get_size():
        if s.is_empty() or random.randint(0, 1) == 0:
            bracket = "([{"[random.randint(0, 2)]
            result += bracket
            s.push(bracket)
        else:
            result += map_brackets[s.pop()]
        n -= 1
    while not s.is_empty():
        result += map_brackets[s.pop()]
    return result

Back to the cProfile, which can be called as follows.

import cProfile

cProfile.run("balancing_bracket((create_balanced_string(1000000)))")

That will generate an output like the following.

         14154214 function calls in 6.988 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.988    6.988 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 BalacingBracketProblem.py:11(__init__)
  1000000    0.678    0.000    0.940    0.000 BalacingBracketProblem.py:15(push)
  1000000    0.522    0.000    0.522    0.000 BalacingBracketProblem.py:19(pop)
  1500002    0.233    0.000    0.233    0.000 BalacingBracketProblem.py:25(is_empty)
   998355    0.153    0.000    0.153    0.000 BalacingBracketProblem.py:28(get_size)
        1    0.484    0.484    1.249    1.249 BalacingBracketProblem.py:32(balancing_bracket)
  1000000    0.262    0.000    0.262    0.000 BalacingBracketProblem.py:5(__init__)
        1    1.639    1.639    5.739    5.739 BalacingBracketProblem.py:57(create_balanced_string)
  1498232    1.029    0.000    2.411    0.000 random.py:200(randrange)
  1498232    0.606    0.000    3.017    0.000 random.py:244(randint)
  1498232    0.924    0.000    1.382    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000    6.988    6.988 {built-in method builtins.exec}
  1498232    0.148    0.000    0.148    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  2662922    0.310    0.000    0.310    0.000 {method 'getrandbits' of '_random.Random' objects}

Where we find our run-time on the highlighted line. It is interesting to notice, that the main time is spend creating a string. And diving deeper into that, you can see, that it is the calls to create random integers that are expensive.

Well, to figure out whether our code is linear in performance, we need to create data points for various input sizes.

That looks pretty linear, O(n), as expected.

Good job.