List Comprehension in Python made Easy with Comparisons

What will we cover in this tutorial?

  • Understand how list comprehension works in Python.
  • Updating and creation of new list is a memory aspect.
  • Test the performance difference between list comprehension and updating a list through a for-loop.

Step 1: Understand what is list comprehension

On wikipedia.org it is defined as follows.

list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists

Wikipedia.org

Then how does that translate into Python? Or is it, how does Python translate that into code?

If this is the first time you hear about list comprehension, but you have been programming for some time in Python and stumbled upon code pieces like this.

l1 = ['1', '2', '3', '4']
l2 = [int(s) for s in l1]
print(l2)

Which will result in a list of integers in l2.

[1, 2, 3, 4]

The construction for l2 is based on l1. Inspecting it closely, you can see a for-loop inside the creation of the square brackets. You could take the for-loop outside and have the same effect.

l1 = ['1', '2', '3', '4']
l2 = []
for s in l1:
  l2.append(int(s))
print(l2)

Nice.

Step 2: Updating and creation

Sometimes you see code like this.

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = [i + 1 for i in l1]
print(l2)

And you also notice that the l1 is not used after.

So what is the problem?

Let’s see an alternative way to do it.

l = [1, 2, 3, 4, 5, 6, 7]
for i in range(len(l)):
  l[i] += 1
print(l)

Which will result in the same effect. So what is the difference?

The first one, with list comprehension, creates a new list, while the second one updates the values of the list.

Not convinced? Investigate this piece of code.

def list_comprehension(l):
  return [i + 1 for i in l]

def update_loop(l):
  for i in range(len(l)):
    l[i] += 1
  return l

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = list_comprehension(l1)
print(l1, l2)

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = update_loop(l1)
print(l1, l2)

Which results in the following output.

[1, 2, 3, 4, 5, 6, 7] [2, 3, 4, 5, 6, 7, 8]
[2, 3, 4, 5, 6, 7, 8] [2, 3, 4, 5, 6, 7, 8]

As you see, the first one (list comprehension) creates a new list, while the other one updates the values in the existing.

From a memory perspective, this can be an issue with extremely large lists. But what about performance?

Step 3: Performance comparison between the two methods

This is interesting. To compare the run-time (performance) of the two functions we can use the cProfile standart Python library.

import cProfile
import random


def list_comprehension(l):
    return [i + 1 for i in l]


def update_loop(l):
    for i in range(len(l)):
        l[i] += 1
    return l


def test(n, it):
    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        list_comprehension(l)

    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        update_loop(l)


cProfile.run('test(10000, 100000)')

This results in the following output.

         152917 function calls in 16.837 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   16.837   16.837 <string>:1(<module>)
        1    0.869    0.869   16.837   16.837 TEST.py:15(test)
        1    0.008    0.008    0.040    0.040 TEST.py:16(<listcomp>)
        1    0.003    0.003    0.023    0.023 TEST.py:20(<listcomp>)
    10000    0.013    0.000    4.739    0.000 TEST.py:5(list_comprehension)
    10000    4.726    0.000    4.726    0.000 TEST.py:6(<listcomp>)
    10000   11.164    0.001   11.166    0.001 TEST.py:9(update_loop)
    20000    0.019    0.000    0.041    0.000 random.py:200(randrange)
    20000    0.010    0.000    0.052    0.000 random.py:244(randint)
    20000    0.014    0.000    0.022    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000   16.837   16.837 {built-in method builtins.exec}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.len}
    20000    0.002    0.000    0.002    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    32911    0.006    0.000    0.006    0.000 {method 'getrandbits' of '_random.Random' objects}

Where we can see that the accumulated time spend in list_comprehension is 4.739 seconds, while the accumulated time spend in update_loop is 11.166 seconds.

Wait? Is it faster to create a new list than update an existing one?

Let’s do some more testing.

Performance of list comprehension vs updating a list

Seems to be no doubt about it.

Let’s just remember that Python is an interpreter and each instruction is highly optimized. Hence, keeping the code as list comprehension, can be highly optimized, while updating the loop is more flexible and takes more lines of interpretation.

Step 4 (Bonus): Use list comprehension with function

One aspect of list comprehension, is that it limits the possibility, while the for-loop construct is more flexible.

But wait, what if you use a function inside the list comprehension construction, then you should be able to regain a lot of that flexibility.

Let’s try to see how that affects the performance.

import cProfile
import random

def add_one(v):
  return v + 1

def list_comprehension(l):
    return [add_one(i) for i in l]


def update_loop(l):
    for i in range(len(l)):
        l[i] += 1
    return l


def test(n, it):
    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        list_comprehension(l)

    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        update_loop(l)


cProfile.run('test(1000, 10000)')

Giving the following output.

         10050065 function calls in 15.826 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   15.826   15.826 <string>:1(<module>)
    10000    3.960    0.000    3.964    0.000 main.py:11(update_loop)
        1    0.296    0.296   15.826   15.826 main.py:17(test)
        1    0.001    0.001    0.005    0.005 main.py:18(<listcomp>)
        1    0.004    0.004    0.008    0.008 main.py:22(<listcomp>)
 10000000    4.389    0.000    4.389    0.000 main.py:4(add_one)
    10000    0.077    0.000   11.554    0.001 main.py:7(list_comprehension)
    10000    7.088    0.001   11.476    0.001 main.py:8(<listcomp>)
     2000    0.003    0.000    0.006    0.000 random.py:200(randrange)
     2000    0.002    0.000    0.008    0.000 random.py:244(randint)
     2000    0.002    0.000    0.003    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000   15.826   15.826 {built-in method builtins.exec}
    10000    0.004    0.000    0.004    0.000 {built-in method builtins.len}
     2000    0.000    0.000    0.000    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     2059    0.000    0.000    0.000    0.000 {method 'getrandbits' of '_random.Random' objects}

Oh no. It takes list_comprehension 11.554 seconds compared to update_loop 3.964 seconds.

This is obviously hard to optimize for the interpreter as it cannot predict the effect of the function (add_one). Adding that call in each iteration of the creation of the list add a big overhead in performance.

Conclusion

Can we conclude that list comprehension always beats updating an existing list? Not really. There is the memory dimension. If lists are big or memory is a sparse resource or you want to avoid too much memory cleanup by Python, then updating the list might be a better option.

Leave a Reply