When to use Numba with Python NumPy: Vectorization vs Numba

What will we cover in this tutorial?

You just want your code to run fast, right? Numba is a just-in-time compiler for Python that works amazingly with NumPy. Does that mean we should alway use Numba?

Well, let’s try some examples out and learn. If you know about NumPy, you know you should use vectorization to get speed. Does Numba beat that?

Step 1: Let’s learn how Numba works

Numba will compile the Python code into machine code and run it. What about the just-in-time compiler? That means, the first time it uses the code you want to turn into machine code, it will compile it and run it. The next, or any time later, it will just run it, as it is already compiled.

Let’s try that.

import numpy as np
from numba import jit
import time


@jit(nopython=True)
def full_sum_numba(a):
    sum = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            sum += a[i, j]
    return sum


iterations = 1000
size = 10000
x = np.random.rand(size, size)

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (Numba) = %s" % (end - start))

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (Numba) = %s" % (end - start))

Where you get.

Elapsed (No Numba) = 0.41634082794189453
Elapsed (No Numba) = 0.11176300048828125

Where you see a difference in runtime.

Oh, did you get what happened in the code? Well, if you put @jit(nopython=True) in front of a function, Numba will try to compile it and run it as machine code.

As you see above, the first time as has an overhead in run-time, because it first compiles and the runs it. The second time, it already has compiled it and can run it immediately.

Step 2: Compare Numba just-in-time code to native Python code

So let us compare how much you gain by using Numba just-in-time (@jit) in our code.

import numpy as np
from numba import jit
import time


def full_sum(a):
    sum = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            sum += a[i, j]
    return sum


@jit(nopython=True)
def full_sum_numba(a):
    sum = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            sum += a[i, j]
    return sum


iterations = 1000
size = 10000
x = np.random.rand(size, size)

start = time.time()
full_sum(x)
end = time.time()
print("Elapsed (No Numba) = %s" % (end - start))

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (Numba) = %s" % (end - start))

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (Numba) = %s" % (end - start))

Here we added a native Python function without the @jit in front and will compare it with one which has. We will compare it here.

Elapsed (No Numba) = 38.08543515205383
Elapsed (No Numba) = 0.41634082794189453
Elapsed (No Numba) = 0.11176300048828125

That is some difference. Also, we have plotted a few more runs in the graph below.

It seems pretty evident.

Step 3: Comparing it with Vectorization

If you don’t know what vectorization is, we can recommend this tutorial. The reason to have vectorization is to move the expensive for-loops into the function call to have optimized code run it.

That sounds a lot like what Numba can do. It can change the expensive for-loops into fast machine code.

But which one is faster?

Well, I think there are two parameters to try out. First, the size of the problem. Second, to see if the number of iterations matter.

import numpy as np
from numba import jit
import time


@jit(nopython=True)
def full_sum_numba(a):
    sum = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            sum += a[i, j]
    return sum


def full_sum_vectorized(a):
    return a.sum()


iterations = 1000
size = 10000
x = np.random.rand(size, size)

start = time.time()
full_sum_vectorized(x)
end = time.time()
print("Elapsed (No Numba) = %s" % (end - start))

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (No Numba) = %s" % (end - start))

start = time.time()
full_sum_numba(x)
end = time.time()
print("Elapsed (No Numba) = %s" % (end - start))

As a function of the size.

It is interesting that Numba is faster for small sized of the problem, while it seems like the vectorized approach outperforms Numba for bigger sizes.

And not surprisingly, the number of iterations only makes the difference bigger.

This is not surprising, as the code in a vectorized call can be more specifically optimized than the more general purpose Numba approach.

Conclusion

Does that mean the Numba does not pay off to use?

No, not at all. First of all, we have only tried it for one vectorized approach, which was obviously very easy to optimize. Secondly, not all loops can be turned into vectorized code. In general it is difficult to have a state in a vectorized approach. Hence, if you need to keep track of some internal state in a loop it can be difficult to find a vectorized approach.

Leave a Reply