Performance comparison of Numba vs Vectorization vs Lambda function with NumPy

What will we cover in this tutorial?

We will continue our investigation of Numba from this tutorial.

Numba is a just-in-time compiler for Python that works amazingly with NumPy. As we saw in the last tutorial, the built in vectorization can depending on the case and size of instance be faster than Numba.

Here we will explore that further as well to see how Numba compares with lambda functions. Lambda functions has the advantage, that they can be parsed as an argument down to a library that can optimize the performance and not depend on slow Python code.

Step 1: Example of Vectorization slower than Numba

In the previous tutorial we only investigated an example of vectorization, which was faster than Numba. Here we will see, that this is not always the case.

import numpy as np
from numba import jit
import time

size = 100
x = np.random.rand(size, size)
y = np.random.rand(size, size)
iterations = 100000


@jit(nopython=True)
def add_numba(a, b):
    c = np.zeros(a.shape)
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            c[i, j] = a[i, j] + b[i, j]
    return c


def add_vectorized(a, b):
    return a + b


# We call the function once, to precompile the code
z = add_numba(x, y)
start = time.time()
for _ in range(iterations):
    z = add_numba(x, y)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))

start = time.time()
for _ in range(iterations):
    z = add_vectorized(x, y)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))

Varying the size of the NumPy array, we can see the performance between the two in the graph below.

Where it is clear that the vectorized approach is slower.

Step 2: Try some more complex example comparing vectorized and Numba

A if-then-else can be expressed as vectorized using the Numpy where function.

import numpy as np
from numba import jit
import time


size = 1000
x = np.random.rand(size, size)
iterations = 1000


@jit(nopython=True)
def numba(a):
    c = np.zeros(a.shape)
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            if a[i, j] < 0.5:
                c[i, j] = 1
    return c


def vectorized(a):
    return np.where(a < 0.5, 1, 0)


# We call the numba function to precompile it before we measure it
z = numba(x)
start = time.time()
for _ in range(iterations):
    z = numba(x)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))

start = time.time()
for _ in range(iterations):
    z = vectorized(x)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))

This results in the following comparison.

That is close, but the vectorized approach is a bit faster.

Step 3: Compare Numba with lambda functions

I am very curious about this. Lambda functions are controversial in Python, and many are not happy about them as they have a lot of syntax, which is not aligned with Python. On the other hand, lambda functions have the advantage that you can send them down in the library that can optimize over the for-loops.

import numpy as np
from numba import jit
import time

size = 1000
x = np.random.rand(size, size)
iterations = 1000


@jit(nopython=True)
def numba(a):
    c = np.zeros((size, size))
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            c[i, j] = a[i, j] + 1
    return c


def lambda_run(a):
    return a.apply(lambda x: x + 1)


# Call the numba function to precompile it before time measurement
z = numba(x)
start = time.time()
for _ in range(iterations):
    z = numba(x)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))

start = time.time()
for _ in range(iterations):
    z = vectorized(x)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))

Resulting in the following performance comparison.

This is again tight, but the lambda approach is still a bit faster.

Remember, this is a simple lambda function and we cannot conclude that lambda function in general are faster than using Numba.

Conclusion

Learnings since the last tutorial is that we have found an example where simple vectorization is slower than Numba. This still leads to the conclusion that performance highly depends on the task. Further, the lambda function seems to give promising performance. Again, this should be compared to the slow approach of a Python for-loop without Numba just-in-time compiled machine code.

Leave a Reply