Comparing Performance of Python list as a Stack – How a wrong implementation can ruin performance

A Stack?

A Stack is using the principle first-in-last-out.

It is like a stack of plates. The last one you put on the top is the first one you take.

How can you implement them in Python? Well, we are in luck, you can use a Stack, and if done correctly, you will have the same performance as an actual Stack implementation will have.

But first, how can you do it wrong?

Well, you might think that the first element of the list is the top of your stack, hence in you will insert the elements on the first position, and, hence, remove them from the first position as well.

# Create a list as a stack
s = []

# Insert into the first position.
element = 7
s.insert(0, element)

# Remove from the first position.
s.pop(0)

Sounds about right?

Let’s test that and compare it with a different approach. To add the newest element to the end of the list, and, hence, remove them from the end of the list.

# Create a list and use it as stack
s = []

# Insert element in last postion
element = 7
s.append(element)

# Remove from the last position
s.pop()

Let’s check the performance of those two approaches.

Comparing the performance of the two approaches

How do you compare. You can use cProfile library. It is easy to use and informative results

See the sample code below, which compares the two approaches by create a stack each and inserting n elements to it and removing them afterwards.

import cProfile


def profile_list_as_queue_wrong(n):
    s = []
    for i in range(n):
        s.insert(0, i)
    while len(s) > 0:
        s.pop(0)


def profile_list_as_queue_correct(n):
    s = []
    for i in range(n):
        s.append(i)
    while len(s) > 0:
        s.pop()


def profile(n):
    profile_list_as_queue_wrong(n)
    profile_list_as_queue_correct(n)


cProfile.run("profile(100000)")

The results are given here.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.842    5.842 <string>:1(<module>)
        1    0.078    0.078    0.107    0.107 Stack.py:12(profile_list_as_queue_correct)
        1    0.000    0.000    5.842    5.842 Stack.py:20(profile)
        1    0.225    0.225    5.735    5.735 Stack.py:4(profile_list_as_queue_wrong)
   200002    0.017    0.000    0.017    0.000 {len}
   100000    0.007    0.000    0.007    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    3.547    0.000    3.547    0.000 {method 'insert' of 'list' objects}
   200000    1.954    0.000    1.954    0.000 {method 'pop' of 'list' objects}
        2    0.014    0.007    0.014    0.007 {range}

Observe that the “wrong” implementation takes over 5 seconds and the “correct” takes approximately 0.1 second. Over a factor 50 in difference.

Looking into the details

If we look at the complexities given by Python, it explains it all.

The Python lists amortised complexities are given on this page.

And you notice that the append and pop (last element) are O(1), which means constant time. Constant time, means that the operations are independent on the size of the lists. That means the correct implementation gives O(n) time complexity.

On the other hand, the insert and pop(0) have linear performance. That basically means that we with the wrong implementation end up with O(n^2) time complexity.

How to Implement a Queue in Python and Compare Performance with a Python list

What will we cover in this article

  • What is a Queue?
  • Implement a Queue in Python
  • Make performance testing of it
  • Compare it with performance of a Python list

What is a Queue?

We all know what a queue is. You go to the grocery store and get spinach, strawberry and bananas for your shake. Then you see a long line of people in front of the register. That line is a queue.

The same holds in programming. You create queues to process data or input of any kind.

How to implement a Queue in Python

It is easier than you think.

First you create a Node class to represent each node in a queue. A node is an abstraction to represent a point to the next node and the actual element.

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


Then you create the class for the Queue.

class Queue:
    def __init__(self):
        self.head = None
        self.tail = None

    def enqueue(self, element):
        if self.head is None:
            self.head = self.tail = Node(element)
        else:
            n = Node(element, self.tail)
            self.tail.next_node = n
            self.tail = n

    def dequeue(self):
        element = self.head.element
        if self.tail == self.head:
            self.tail = self.head = None
        else:
            self.head = self.head.next_node
        return element

    def is_empty(self):
        return self.head is None

How does it work. Let’s make a simple example.

q = Queue()
for i in range(10):
    q.enqueue(i)

while not q.is_empty():
    print(q.dequeue())

Which will output.

0
1
2
3
4
5
6
7
8
9

Yes! You guessed it.

How do we test performance?

I like to use the cProfile library. It is easy to use and gives informative results.

So how do you test performance? You simply import the cProfile library and use the cProfile.run(…) call.

You also need to do some operations to see how your Queue performs. See the code as an example.

import cProfile


def profile_queue(n):
    q = Queue()
    for i in range(n):
        q.enqueue(i)
    while not q.is_empty():
        q.dequeue()


def profile(n):
    profile_queue(n)


cProfile.run("profile(100000)")

Which will result in the following output.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.819    0.819 <string>:1(<module>)
   100000    0.310    0.000    0.351    0.000 Queue.py:11(enqueue)
   100000    0.308    0.000    0.308    0.000 Queue.py:19(dequeue)
   100000    0.041    0.000    0.041    0.000 Queue.py:2(__init__)
   100001    0.021    0.000    0.021    0.000 Queue.py:27(is_empty)
        1    0.132    0.132    0.819    0.819 Queue.py:34(profile_queue)
        1    0.000    0.000    0.819    0.819 Queue.py:42(profile)
        1    0.000    0.000    0.000    0.000 Queue.py:7(__init__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.008    0.008    0.008    0.008 {range}

The interesting line is line 9, which tells us how much time is spend in the call to profile_queue.

But is the result good?

We need to compare it to other implementations.

Performance testing the Queue with a Python list

Python lists are used for anything. Can we use a Python list as a Queue. Of course. Let’s try to implement that and compare it to our Queue.

import cProfile


def profile_queue(n):
    q = Queue()
    for i in range(n):
        q.enqueue(i)
    while not q.is_empty():
        q.dequeue()


def profile_list_as_queue(n):
    q = []
    for i in range(n):
        q.insert(0,i)
    while len(q) > 0:
        q.pop()


def profile(n):
    profile_queue(n)
    profile_list_as_queue(n)


cProfile.run("profile(100000)")

How does that compare? Let’s see.

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.680    3.680 <string>:1(<module>)
   100000    0.295    0.000    0.331    0.000 Queue.py:11(enqueue)
   100000    0.298    0.000    0.298    0.000 Queue.py:19(dequeue)
   100000    0.036    0.000    0.036    0.000 Queue.py:2(__init__)
   100001    0.019    0.000    0.019    0.000 Queue.py:27(is_empty)
        1    0.104    0.104    0.756    0.756 Queue.py:34(profile_queue)
        1    0.101    0.101    2.924    2.924 Queue.py:42(profile_list_as_queue)
        1    0.000    0.000    3.680    3.680 Queue.py:50(profile)
        1    0.000    0.000    0.000    0.000 Queue.py:7(__init__)
   100001    0.005    0.000    0.005    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    2.805    0.000    2.805    0.000 {method 'insert' of 'list' objects}
   100000    0.012    0.000    0.012    0.000 {method 'pop' of 'list' objects}
        2    0.004    0.002    0.004    0.002 {range}

Wow. Our Queue is way faster than the Python list.

But how is it comparing in general?

Comparing the performance of the Queue and a Python list as a Queue.

While it is difficult to see, the performance of the Queue is O(n) (linear) while the performance of the Python list as a Queue is O(n^2).

Hence, the Queue will outperform the Python list for this use case.

Explaining and Solving the Balancing Bracket Problem and Analysing the Performance and Run-time

We are going to answer the following questions.

  • What is the Balancing Bracket Problem?
  • Why is the Balancing Bracket Problem interesting?
  • How a Stack can help you solve the Balancing Bracket Problem efficiently?
  • What is the time complexity and do our implantation have that performance?

What is the Balancing Bracket Problem and how do you solve it?

See the video below to get a good introduction to the problem.

How to solve the problem in Python

You need a stack. You could use a Python list as a stack, while the append and pop last element in the Python list are amortised O(1) time, it is not guaranteed to get the performance you need.

Implementing your own stack will give the the worst case O(1) time complexity. So let us begin by implementing a Stack in Python.

It is more simple than you think.

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


class Stack:
    def __init__(self):
        self.stack = None

    def push(self, element):
        self.stack = Node(element, self.stack)

    def pop(self):
        self.stack = self.stack.next_node

    def is_empty(self):
        return self.stack is None

If you want to read more about stacks also check out this post.

Then given that stack solving the Balancing Bracket Problems becomes easy.

def balancing_bracket(s):
    stack = Stack()
    for c in s:
        if c in "([{":
            stack.push(c)
        elif c in ")]}":
            if stack.is_empty():
                return False
            e = stack.pop()
            if e == "(" and c == ")":
                continue
            elif e == "[" and c == "]":
                continue
            elif e == "{" and c == "}":
                continue
            else:
                return False
        else:
            continue
    if not stack.is_empty():
        return False
    else:
        return True

Time complexity analysis of our solution

Well, the idea with the solution is that it should be O(n), that is, linear in complexity. That means, that a problem of double size should take double time to solve.

The naive solution takes O(n^2), which means a problem of double size takes 4 times longer time.

But let us try to investigate the time performance of our solution. A good tool for that is the cProfile library provided by Python.

But first we need to be able to create random data. Also notice, that the random data we create should be balancing brackets to have worst case performance on our implementation.

To generate random balancing brackets string you can use the following code.

import random

def create_balanced_string(n):
    map_brackets = {"(": ")", "[": "]", "{": "}"}
    s = Stack()
    result = ""
    while n > 0 and n > s.get_size():
        if s.is_empty() or random.randint(0, 1) == 0:
            bracket = "([{"[random.randint(0, 2)]
            result += bracket
            s.push(bracket)
        else:
            result += map_brackets[s.pop()]
        n -= 1
    while not s.is_empty():
        result += map_brackets[s.pop()]
    return result

Back to the cProfile, which can be called as follows.

import cProfile

cProfile.run("balancing_bracket((create_balanced_string(1000000)))")

That will generate an output like the following.

         14154214 function calls in 6.988 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.988    6.988 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 BalacingBracketProblem.py:11(__init__)
  1000000    0.678    0.000    0.940    0.000 BalacingBracketProblem.py:15(push)
  1000000    0.522    0.000    0.522    0.000 BalacingBracketProblem.py:19(pop)
  1500002    0.233    0.000    0.233    0.000 BalacingBracketProblem.py:25(is_empty)
   998355    0.153    0.000    0.153    0.000 BalacingBracketProblem.py:28(get_size)
        1    0.484    0.484    1.249    1.249 BalacingBracketProblem.py:32(balancing_bracket)
  1000000    0.262    0.000    0.262    0.000 BalacingBracketProblem.py:5(__init__)
        1    1.639    1.639    5.739    5.739 BalacingBracketProblem.py:57(create_balanced_string)
  1498232    1.029    0.000    2.411    0.000 random.py:200(randrange)
  1498232    0.606    0.000    3.017    0.000 random.py:244(randint)
  1498232    0.924    0.000    1.382    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000    6.988    6.988 {built-in method builtins.exec}
  1498232    0.148    0.000    0.148    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  2662922    0.310    0.000    0.310    0.000 {method 'getrandbits' of '_random.Random' objects}

Where we find our run-time on the highlighted line. It is interesting to notice, that the main time is spend creating a string. And diving deeper into that, you can see, that it is the calls to create random integers that are expensive.

Well, to figure out whether our code is linear in performance, we need to create data points for various input sizes.

That looks pretty linear, O(n), as expected.

Good job.