Deleting Elements of a Python List while Iterating

What will we cover in this tutorial?

  • Understand the challenge with deleting elements while iterating over a Python list.
  • How to delete element from a Python list while iterating over it.

Step 1: What happens when you just delete elements from a Python list while iterating over it?

Let’s first try this simple example to understand the challenge of deleting element in a Python list while iterating over it.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for e in a:
  a.remove(e)
print(a)

Now, looking at this piece of code, it would seem to be intended to delete all elements. But that is not happening. See, the output is.

[1, 3, 5, 7, 9]

Seems like every second element is deleted. Right?

Let’s try to understand that. When we enter the the loop we see the following view.

for e (= 0, first element) in a (= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]):
  a.remove(e)

Then the first element is removed on the second line, then the view is.

for e (= 0, first element) in a (= [1, 2, 3, 4, 5, 6, 7, 8, 9]):
  a.remove(e) (a = [1, 2, 3, 4, 5, 6, 7, 8, 9])

Going into the second iteration it looks like this.

for e (= 2, second element) in a (= [1, 2, 3, 4, 5, 6, 7, 8, 9]):
  a.remove(e)

Hence, we see that the iterator takes the second element, which now is the number 2.

This explains why the every second number is deleted from the list.

Step 2: What if we use index instead

Good idea. Let’s see what happens.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i, e in enumerate(a):
  a.pop(i)
print(a)

Which results in the same.

[1, 3, 5, 7, 9]

What if we iterate directly over the index by using the length of the list.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in range(len(a)):
  a.pop(i)
print(a)

Oh, no.

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    a.pop(i)
IndexError: pop index out of range

I get it. It is because the len(a) is invoked in the first iteration and results to 10. Then when we reach i = 5, we have already pop’ed 5 elements and have only 5 elements left. Hence, out of bound.

Not convinced?

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in range(len(a)):
  print(i, len(a), a)
  a.pop(i)
print(a)

Resulting to.

0 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 9 [1, 2, 3, 4, 5, 6, 7, 8, 9]
2 8 [1, 3, 4, 5, 6, 7, 8, 9]
3 7 [1, 3, 5, 6, 7, 8, 9]
4 6 [1, 3, 5, 7, 8, 9]
5 5 [1, 3, 5, 7, 9]
Traceback (most recent call last):
  File "main.py", line 4, in <module>
    a.pop(i)
IndexError: pop index out of range

But what to do?

Step 3: How to delete elements while iterating over a list

The problem we want to solve is not to delete all the element. It is to delete entries based on their values or some conditions, where we need to interpret the values of the elements.

How can we do that?

By using list comprehension or by making a copy. Or is it the same, as list comprehension is creating a new copy, right?

Okay, one step at the time. Just see the following example.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [i for i in a if i % 2 == 0]
print(a)

Resulting in a copy of the the original list with only the even elements.

[0, 2, 4, 6, 8]

To see it is a copy you can evaluate the following code.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = a
a = [i for i in a if i % 2 == 0]
print(a)
print(b)

Resulting in the following, where you see the variable a get’s a new copy of it and the variable b refers to the original (and unmodified version).

[0, 2, 4, 6, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Hence, the effect of the list comprehension construction above is as the following code shows.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# a = [i for i in a if i % 2 == 0]
c = []
for i in a:
    if i % 2 == 0:
        c.append(i)
a = c
print(a)

Getting the what you want.

[0, 2, 4, 6, 8]

Next steps

You can make the criteria more advanced by making the criteria by a function call.

def criteria(v):
  # some advanced code that returns True of False

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [i for i in a if criteria(i)]

And if you want to keep a state of all previous criteria, then you can even use an Object to keep that stored.

class State:
  # ...
  def criteria(self, v):
    # ...

s = State()
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [i for i in a if s.criteria(i)]

Also, check out this tutorial that makes some observations on performance on list comprehensions.

How to Reverse a Python List in-place and Understand Common Mistakes

Understand the difference

What does in-place mean? When reversing a list it means to not create a new list.

In-place reversing can be done by calling reverse().

a = [i for i in range(19)]
b = a

print("Before reverse")
print(a)
print(b)
a.reverse()

print("After reverse")
print(a)
print(b)

Will result in the following.

Before reverse
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
After reverse
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

While using the slicing to get the list reversed will create a copy.

a = [i for i in range(19)]
b = a

print("Before reverse")
print(a)
print(b)

a = a[::-1] # Reverse using slicing
print("After reverse")
print(a)
print(b)

Resulting in.

Before reverse
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
After reverse
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

It creates a new copy of the list a and keeps the original, which b points at, untouched.

Correct method: Swapping in a loop

Reversing a list in-place can be achieved with a call to reverse() as was shown above. A common question to get asked during a job-interview is to actually show an implementation of reversing.

As you might guess, you can achieve that by swapping elements.

[0, 1, 2, 3, 4, 5, 6]
# Swap first and last
[6, 1, 2, 3, 4, 5, 0]
# Swap second and second last
[6, 5, 2, 3, 4, 1, 0]
# Swap third and third last
[6, 5, 4, 3, 2, 1, 0]
# Swap the middle with, ah, itself (just kidding, this step is not needed)
[6, 5, 4, 3, 2, 1, 0]

How can you implement that? Let’s try.

a = [i for i in range(19)]
b = a

print(a)
print(b)

for i in range(len(a)//2):
    a[i], a[len(a) - i - 1] = a[len(a) - i - 1], a[i]

print(a)
print(b)

Resulting in the following expected output.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Again notice, that I make b point to a, in order to ensure and convince you that we make the reversing in-place and not of a copy. If the print(b) would not print the identical output as print(a), we would have a problem to explain.

Pitfall: List comprehension (incorrect)

But wait a minute? Doesn’t list comprehension mean making a new list based on an existing list?

Correct!

But we can actually circumvent that by using some syntax (or can we? Recommend you read it all).

a = [i for i in range(19)]
b = a

print(a)
print(b)

a[:] = [a[len(a) - i - 1] for i in range(len(a))]
print(a)
print(b)

The slice assignment a[:] enforces Python to do assign, or override, the original values of a. Even if the output shows the following.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

The catch is that [a[len(a) – i – 1] for i in range(len(a))] creates a new list, but the slice assignment ensures to override the values of a.

Bummer.

Conclusion

It is easy to make false conclusions like the example above shows. One of the beauties of Python is the extreme power on a high level. The price is that many things are hard to understand unless you dive deep into it.

List Comprehension in Python made Easy with Comparisons

What will we cover in this tutorial?

  • Understand how list comprehension works in Python.
  • Updating and creation of new list is a memory aspect.
  • Test the performance difference between list comprehension and updating a list through a for-loop.

Step 1: Understand what is list comprehension

On wikipedia.org it is defined as follows.

list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists

Wikipedia.org

Then how does that translate into Python? Or is it, how does Python translate that into code?

If this is the first time you hear about list comprehension, but you have been programming for some time in Python and stumbled upon code pieces like this.

l1 = ['1', '2', '3', '4']
l2 = [int(s) for s in l1]
print(l2)

Which will result in a list of integers in l2.

[1, 2, 3, 4]

The construction for l2 is based on l1. Inspecting it closely, you can see a for-loop inside the creation of the square brackets. You could take the for-loop outside and have the same effect.

l1 = ['1', '2', '3', '4']
l2 = []
for s in l1:
  l2.append(int(s))
print(l2)

Nice.

Step 2: Updating and creation

Sometimes you see code like this.

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = [i + 1 for i in l1]
print(l2)

And you also notice that the l1 is not used after.

So what is the problem?

Let’s see an alternative way to do it.

l = [1, 2, 3, 4, 5, 6, 7]
for i in range(len(l)):
  l[i] += 1
print(l)

Which will result in the same effect. So what is the difference?

The first one, with list comprehension, creates a new list, while the second one updates the values of the list.

Not convinced? Investigate this piece of code.

def list_comprehension(l):
  return [i + 1 for i in l]

def update_loop(l):
  for i in range(len(l)):
    l[i] += 1
  return l

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = list_comprehension(l1)
print(l1, l2)

l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = update_loop(l1)
print(l1, l2)

Which results in the following output.

[1, 2, 3, 4, 5, 6, 7] [2, 3, 4, 5, 6, 7, 8]
[2, 3, 4, 5, 6, 7, 8] [2, 3, 4, 5, 6, 7, 8]

As you see, the first one (list comprehension) creates a new list, while the other one updates the values in the existing.

From a memory perspective, this can be an issue with extremely large lists. But what about performance?

Step 3: Performance comparison between the two methods

This is interesting. To compare the run-time (performance) of the two functions we can use the cProfile standart Python library.

import cProfile
import random


def list_comprehension(l):
    return [i + 1 for i in l]


def update_loop(l):
    for i in range(len(l)):
        l[i] += 1
    return l


def test(n, it):
    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        list_comprehension(l)

    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        update_loop(l)


cProfile.run('test(10000, 100000)')

This results in the following output.

         152917 function calls in 16.837 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   16.837   16.837 <string>:1(<module>)
        1    0.869    0.869   16.837   16.837 TEST.py:15(test)
        1    0.008    0.008    0.040    0.040 TEST.py:16(<listcomp>)
        1    0.003    0.003    0.023    0.023 TEST.py:20(<listcomp>)
    10000    0.013    0.000    4.739    0.000 TEST.py:5(list_comprehension)
    10000    4.726    0.000    4.726    0.000 TEST.py:6(<listcomp>)
    10000   11.164    0.001   11.166    0.001 TEST.py:9(update_loop)
    20000    0.019    0.000    0.041    0.000 random.py:200(randrange)
    20000    0.010    0.000    0.052    0.000 random.py:244(randint)
    20000    0.014    0.000    0.022    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000   16.837   16.837 {built-in method builtins.exec}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.len}
    20000    0.002    0.000    0.002    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    32911    0.006    0.000    0.006    0.000 {method 'getrandbits' of '_random.Random' objects}

Where we can see that the accumulated time spend in list_comprehension is 4.739 seconds, while the accumulated time spend in update_loop is 11.166 seconds.

Wait? Is it faster to create a new list than update an existing one?

Let’s do some more testing.

Performance of list comprehension vs updating a list

Seems to be no doubt about it.

Let’s just remember that Python is an interpreter and each instruction is highly optimized. Hence, keeping the code as list comprehension, can be highly optimized, while updating the loop is more flexible and takes more lines of interpretation.

Step 4 (Bonus): Use list comprehension with function

One aspect of list comprehension, is that it limits the possibility, while the for-loop construct is more flexible.

But wait, what if you use a function inside the list comprehension construction, then you should be able to regain a lot of that flexibility.

Let’s try to see how that affects the performance.

import cProfile
import random

def add_one(v):
  return v + 1

def list_comprehension(l):
    return [add_one(i) for i in l]


def update_loop(l):
    for i in range(len(l)):
        l[i] += 1
    return l


def test(n, it):
    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        list_comprehension(l)

    l = [random.randint(0, n) for i in range(n)]
    for i in range(it):
        update_loop(l)


cProfile.run('test(1000, 10000)')

Giving the following output.

         10050065 function calls in 15.826 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   15.826   15.826 <string>:1(<module>)
    10000    3.960    0.000    3.964    0.000 main.py:11(update_loop)
        1    0.296    0.296   15.826   15.826 main.py:17(test)
        1    0.001    0.001    0.005    0.005 main.py:18(<listcomp>)
        1    0.004    0.004    0.008    0.008 main.py:22(<listcomp>)
 10000000    4.389    0.000    4.389    0.000 main.py:4(add_one)
    10000    0.077    0.000   11.554    0.001 main.py:7(list_comprehension)
    10000    7.088    0.001   11.476    0.001 main.py:8(<listcomp>)
     2000    0.003    0.000    0.006    0.000 random.py:200(randrange)
     2000    0.002    0.000    0.008    0.000 random.py:244(randint)
     2000    0.002    0.000    0.003    0.000 random.py:250(_randbelow_with_getrandbits)
        1    0.000    0.000   15.826   15.826 {built-in method builtins.exec}
    10000    0.004    0.000    0.004    0.000 {built-in method builtins.len}
     2000    0.000    0.000    0.000    0.000 {method 'bit_length' of 'int' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     2059    0.000    0.000    0.000    0.000 {method 'getrandbits' of '_random.Random' objects}

Oh no. It takes list_comprehension 11.554 seconds compared to update_loop 3.964 seconds.

This is obviously hard to optimize for the interpreter as it cannot predict the effect of the function (add_one). Adding that call in each iteration of the creation of the list add a big overhead in performance.

Conclusion

Can we conclude that list comprehension always beats updating an existing list? Not really. There is the memory dimension. If lists are big or memory is a sparse resource or you want to avoid too much memory cleanup by Python, then updating the list might be a better option.

Sort a Python List with String of Integers or a Mixture

What will we cover in this tutorial?

  • How can you sort a list of strings containing integers by the integer value?
  • Or what if it contains both strings containing integers and integers?
  • Finally, also how if only a substring contains integers?

Why sort on a list of integers represented in strings fails

First of, we need to understand why it is not trivial to solve by just calling sort on the list.

Let’s just try with an example.

l = ['4', '8', '12', '23', '4']
l.sort()
print(l)

Which will result in the following list.

['12', '23', '4', '4', '8']

Where you see the list is sorted lexicographical order and not by the numeric value the strings represent.

How to solve this

Solving this is quite straight forward if you know your way around Python. You look in the documentation and see that it takes a key as argument. Okay, you are new to this, so what does it mean.

key specifies a function of one argument that is used to extract a comparison key from each list element

Python docs.

Still not comfortable about it. Let’s try to figure it out together. If you are new to Python, you might not know that you can send functions as arguments like any other value.

The key argument is a function that will be applied on every item in the list. The output of that function will be used to make a simple comparison and order it by that.

That is great news. Why?

I am glad you asked. If we just use the int() function as argument, it should cast the string to an integer and use that for comparison and our problem is solved.

Let’s try.

l = ['4', '8', '12', '23', '4']
l.sort(key=int)
print(l)

Resulting to the following list.

['4', '4', '8', '12', '23']

How simple is that?

What if my list is a mixture of integers and strings of integers?

What is your wild guess?

l = ['4', '8', 12, '23', 4]
l.sort(key=int)
print(l)

Notice that some integers are not strings any more. Let see the output.

['4', 4, '8', 12, '23']

It works. This is why we love Python!

But what if it is more complex?

A complex examples of sorting

Say we have a list of of strings like this one.

l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']

The story is something like this. You ask a lot of providers how much it will cost to give a specific service. The answers are given in the list and you want to investigate them in order of lowest price.

We can just do the same, right?

l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=int)
print(l)

Wrong!

Traceback (most recent call last):
  File "main.py", line 2, in <module>
    l.sort(key=int)
ValueError: invalid literal for int() with base 10: '4 dollars'

The string is not just an integer. It contains more information.

The good luck is that we can send any function. Let’s try to create one.

def comp(o):
  return int(o.split()[0])

l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=comp)
print(l)

And the output is as desired.

['4 dollars', '4 dollars', '8 dollars', '12 dollars', '23 dollars']

Too fast? Let’s just analyse our function comp. It contains only one return statement. Try to read it from inside out.

o.split() splits the string up in a list of items contain word by word. Hence, the call of ‘4 dollars’.split() will result in [‘4’, ‘dollars’].

Then o.split()[0] will return the first item of that list, i.e. ‘4’.

Finally, we cast it to an integer by int(o.split()[0]).

Remember that the comparison is done by the output of the function, that is what the function returns, which in this case is the integer represented by the first item in the string.

What about lambda?

Lambda? Yes, lambda functions is also a hot subject.

A lambda function is just a smart way to write simple functions you send as arguments to other functions. Like in this case a sorting function.

Let’s try if we can do that.

l = ['4 dollars', '8 dollars', '12 dollars', '23 dollars', '4 dollars']
l.sort(key=lambda o: int(o.split()[0]))
print(l)

Resulting in the same output.

['4 dollars', '4 dollars', '8 dollars', '12 dollars', '23 dollars']

A bit magic with lambda functions? We advice you to read this tutorial on the subject.