What will we cover in this tutorial?

• Understand how list comprehension works in Python.
• Updating and creation of new list is a memory aspect.
• Test the performance difference between list comprehension and updating a list through a for-loop.

Step 1: Understand what is list comprehension

On wikipedia.org it is defined as follows.

list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists

Wikipedia.org

Then how does that translate into Python? Or is it, how does Python translate that into code?

If this is the first time you hear about list comprehension, but you have been programming for some time in Python and stumbled upon code pieces like this.

```l1 = ['1', '2', '3', '4']
l2 = [int(s) for s in l1]
print(l2)
```

Which will result in a list of integers in l2.

```[1, 2, 3, 4]
```

The construction for l2 is based on l1. Inspecting it closely, you can see a for-loop inside the creation of the square brackets. You could take the for-loop outside and have the same effect.

```l1 = ['1', '2', '3', '4']
l2 = []
for s in l1:
l2.append(int(s))
print(l2)
```

Nice.

Step 2: Updating and creation

Sometimes you see code like this.

```l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = [i + 1 for i in l1]
print(l2)
```

And you also notice that the l1 is not used after.

So what is the problem?

Let’s see an alternative way to do it.

```l = [1, 2, 3, 4, 5, 6, 7]
for i in range(len(l)):
l[i] += 1
print(l)
```

Which will result in the same effect. So what is the difference?

The first one, with list comprehension, creates a new list, while the second one updates the values of the list.

Not convinced? Investigate this piece of code.

```def list_comprehension(l):
return [i + 1 for i in l]

def update_loop(l):
for i in range(len(l)):
l[i] += 1
return l

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = list_comprehension(l1)
print(l1, l2)

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = update_loop(l1)
print(l1, l2)
```

Which results in the following output.

```[1, 2, 3, 4, 5, 6, 7] [2, 3, 4, 5, 6, 7, 8]
[2, 3, 4, 5, 6, 7, 8] [2, 3, 4, 5, 6, 7, 8]
```

As you see, the first one (list comprehension) creates a new list, while the other one updates the values in the existing.

From a memory perspective, this can be an issue with extremely large lists. But what about performance?

Step 3: Performance comparison between the two methods

This is interesting. To compare the run-time (performance) of the two functions we can use the cProfile standart Python library.

```import cProfile
import random

def list_comprehension(l):
return [i + 1 for i in l]

def update_loop(l):
for i in range(len(l)):
l[i] += 1
return l

def test(n, it):
l = [random.randint(0, n) for i in range(n)]
for i in range(it):
list_comprehension(l)

l = [random.randint(0, n) for i in range(n)]
for i in range(it):
update_loop(l)

cProfile.run('test(10000, 100000)')
```

This results in the following output.

```         152917 function calls in 16.837 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000   16.837   16.837 <string>:1(<module>)
1    0.869    0.869   16.837   16.837 TEST.py:15(test)
1    0.008    0.008    0.040    0.040 TEST.py:16(<listcomp>)
1    0.003    0.003    0.023    0.023 TEST.py:20(<listcomp>)
10000    0.013    0.000    4.739    0.000 TEST.py:5(list_comprehension)
10000    4.726    0.000    4.726    0.000 TEST.py:6(<listcomp>)
10000   11.164    0.001   11.166    0.001 TEST.py:9(update_loop)
20000    0.019    0.000    0.041    0.000 random.py:200(randrange)
20000    0.010    0.000    0.052    0.000 random.py:244(randint)
20000    0.014    0.000    0.022    0.000 random.py:250(_randbelow_with_getrandbits)
1    0.000    0.000   16.837   16.837 {built-in method builtins.exec}
10000    0.002    0.000    0.002    0.000 {built-in method builtins.len}
20000    0.002    0.000    0.002    0.000 {method 'bit_length' of 'int' objects}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
32911    0.006    0.000    0.006    0.000 {method 'getrandbits' of '_random.Random' objects}
```

Where we can see that the accumulated time spend in list_comprehension is 4.739 seconds, while the accumulated time spend in update_loop is 11.166 seconds.

Wait? Is it faster to create a new list than update an existing one?

Let’s do some more testing.

Seems to be no doubt about it.

Let’s just remember that Python is an interpreter and each instruction is highly optimized. Hence, keeping the code as list comprehension, can be highly optimized, while updating the loop is more flexible and takes more lines of interpretation.

Step 4 (Bonus): Use list comprehension with function

One aspect of list comprehension, is that it limits the possibility, while the for-loop construct is more flexible.

But wait, what if you use a function inside the list comprehension construction, then you should be able to regain a lot of that flexibility.

Let’s try to see how that affects the performance.

```import cProfile
import random

def add_one(v):
return v + 1

def list_comprehension(l):
return [add_one(i) for i in l]

def update_loop(l):
for i in range(len(l)):
l[i] += 1
return l

def test(n, it):
l = [random.randint(0, n) for i in range(n)]
for i in range(it):
list_comprehension(l)

l = [random.randint(0, n) for i in range(n)]
for i in range(it):
update_loop(l)

cProfile.run('test(1000, 10000)')
```

Giving the following output.

```         10050065 function calls in 15.826 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000   15.826   15.826 <string>:1(<module>)
10000    3.960    0.000    3.964    0.000 main.py:11(update_loop)
1    0.296    0.296   15.826   15.826 main.py:17(test)
1    0.001    0.001    0.005    0.005 main.py:18(<listcomp>)
1    0.004    0.004    0.008    0.008 main.py:22(<listcomp>)
10000000    4.389    0.000    4.389    0.000 main.py:4(add_one)
10000    0.077    0.000   11.554    0.001 main.py:7(list_comprehension)
10000    7.088    0.001   11.476    0.001 main.py:8(<listcomp>)
2000    0.003    0.000    0.006    0.000 random.py:200(randrange)
2000    0.002    0.000    0.008    0.000 random.py:244(randint)
2000    0.002    0.000    0.003    0.000 random.py:250(_randbelow_with_getrandbits)
1    0.000    0.000   15.826   15.826 {built-in method builtins.exec}
10000    0.004    0.000    0.004    0.000 {built-in method builtins.len}
2000    0.000    0.000    0.000    0.000 {method 'bit_length' of 'int' objects}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
2059    0.000    0.000    0.000    0.000 {method 'getrandbits' of '_random.Random' objects}
```

Oh no. It takes list_comprehension 11.554 seconds compared to update_loop 3.964 seconds.

This is obviously hard to optimize for the interpreter as it cannot predict the effect of the function (add_one). Adding that call in each iteration of the creation of the list add a big overhead in performance.

Conclusion

Can we conclude that list comprehension always beats updating an existing list? Not really. There is the memory dimension. If lists are big or memory is a sparse resource or you want to avoid too much memory cleanup by Python, then updating the list might be a better option.

What will we cover in this tutorial?

• How can you sort a list of strings containing integers by the integer value?
• Or what if it contains both strings containing integers and integers?
• Finally, also how if only a substring contains integers?

Why sort on a list of integers represented in strings fails

First of, we need to understand why it is not trivial to solve by just calling sort on the list.

Let’s just try with an example.

```l = ['4', '8', '12', '23', '4']
l.sort()
print(l)
```

Which will result in the following list.

```['12', '23', '4', '4', '8']
```

Where you see the list is sorted lexicographical order and not by the numeric value the strings represent.

How to solve this

Solving this is quite straight forward if you know your way around Python. You look in the documentation and see that it takes a key as argument. Okay, you are new to this, so what does it mean.

key specifies a function of one argument that is used to extract a comparison key from each list element

Python docs.

Still not comfortable about it. Let’s try to figure it out together. If you are new to Python, you might not know that you can send functions as arguments like any other value.

The key argument is a function that will be applied on every item in the list. The output of that function will be used to make a simple comparison and order it by that.

That is great news. Why?

I am glad you asked. If we just use the int() function as argument, it should cast the string to an integer and use that for comparison and our problem is solved.

Let’s try.

```l = ['4', '8', '12', '23', '4']
l.sort(key=int)
print(l)
```

Resulting to the following list.

```['4', '4', '8', '12', '23']
```

How simple is that?

What if my list is a mixture of integers and strings of integers?

What is your wild guess?

```l = ['4', '8', 12, '23', 4]
l.sort(key=int)
print(l)
```

Notice that some integers are not strings any more. Let see the output.

```['4', 4, '8', 12, '23']
```

It works. This is why we love Python!

But what if it is more complex?

A complex examples of sorting

Say we have a list of of strings like this one.

```l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
```

The story is something like this. You ask a lot of providers how much it will cost to give a specific service. The answers are given in the list and you want to investigate them in order of lowest price.

We can just do the same, right?

```l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=int)
print(l)
```

Wrong!

```Traceback (most recent call last):
File "main.py", line 2, in <module>
l.sort(key=int)
ValueError: invalid literal for int() with base 10: '4 dollars'
```

The string is not just an integer. It contains more information.

The good luck is that we can send any function. Let’s try to create one.

```def comp(o):
return int(o.split()[0])

l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=comp)
print(l)
```

And the output is as desired.

```['4 dollars', '4 dollars', '8 dollars', '12 dollars', '23 dollars']
```

Too fast? Let’s just analyse our function comp. It contains only one return statement. Try to read it from inside out.

o.split() splits the string up in a list of items contain word by word. Hence, the call of ‘4 dollars’.split() will result in [‘4’, ‘dollars’].

Then o.split()[0] will return the first item of that list, i.e. ‘4’.

Finally, we cast it to an integer by int(o.split()[0]).

Remember that the comparison is done by the output of the function, that is what the function returns, which in this case is the integer represented by the first item in the string.

What about lambda?

Lambda? Yes, lambda functions is also a hot subject.

A lambda function is just a smart way to write simple functions you send as arguments to other functions. Like in this case a sorting function.

Let’s try if we can do that.

```l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=lambda o: int(o.split()[0]))
print(l)
```

Resulting in the same output.

```['4 dollars', '4 dollars', '8 dollars', '12 dollars', '23 dollars']
```

A bit magic with lambda functions? We advice you to read this tutorial on the subject.

How to profile a program in Python

In this video we will see how cProfile (default Python library) can help you to get run-times from your Python program.

Queue vs Python lists

In this video we will compare the performance of a simple Queue implemented directly into Python (no optimisations) with the default Python list.

Can it compare with it on performance?

This is where time complexity analysis come into the picture. A Queue insert and deletion is O(1) time complexity. A Python list used as a queue has O(n) time complexity.

But does the performance and run-time show the same? Here we compare the run-time by using cProfile in Python.

Want to learn more about Linked-lists, Stacks and Queues?