## What will we cover in this tutorial?

We will continue our investigation of Numba from this tutorial.

**Numba** is a just-in-time compiler for **Python** that works amazingly with **NumPy**. As we saw in the last tutorial, the built in **vectorization** can depending on the case and size of instance be faster than **Numba**.

Here we will explore that further as well to see how **Numba** compares with **lambda** functions. **Lambda** functions has the advantage, that they can be parsed as an argument down to a library that can optimize the performance and not depend on slow **Python** code.

## Step 1: Example of Vectorization slower than Numba

In the previous tutorial we only investigated an example of vectorization, which was faster than Numba. Here we will see, that this is not always the case.

```
import numpy as np
from numba import jit
import time
size = 100
x = np.random.rand(size, size)
y = np.random.rand(size, size)
iterations = 100000
@jit(nopython=True)
def add_numba(a, b):
c = np.zeros(a.shape)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
c[i, j] = a[i, j] + b[i, j]
return c
def add_vectorized(a, b):
return a + b
# We call the function once, to precompile the code
z = add_numba(x, y)
start = time.time()
for _ in range(iterations):
z = add_numba(x, y)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))
start = time.time()
for _ in range(iterations):
z = add_vectorized(x, y)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))
```

Varying the size of the NumPy array, we can see the performance between the two in the graph below.

Where it is clear that the vectorized approach is slower.

## Step 2: Try some more complex example comparing vectorized and Numba

A if-then-else can be expressed as vectorized using the **Numpy** **where** function.

```
import numpy as np
from numba import jit
import time
size = 1000
x = np.random.rand(size, size)
iterations = 1000
@jit(nopython=True)
def numba(a):
c = np.zeros(a.shape)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
if a[i, j] < 0.5:
c[i, j] = 1
return c
def vectorized(a):
return np.where(a < 0.5, 1, 0)
# We call the numba function to precompile it before we measure it
z = numba(x)
start = time.time()
for _ in range(iterations):
z = numba(x)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))
start = time.time()
for _ in range(iterations):
z = vectorized(x)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))
```

This results in the following comparison.

That is close, but the vectorized approach is a bit faster.

## Step 3: Compare Numba with lambda functions

I am very curious about this. Lambda functions are controversial in Python, and many are not happy about them as they have a lot of syntax, which is not aligned with Python. On the other hand, lambda functions have the advantage that you can send them down in the library that can optimize over the for-loops.

```
import numpy as np
from numba import jit
import time
size = 1000
x = np.random.rand(size, size)
iterations = 1000
@jit(nopython=True)
def numba(a):
c = np.zeros((size, size))
for i in range(a.shape[0]):
for j in range(a.shape[1]):
c[i, j] = a[i, j] + 1
return c
def lambda_run(a):
return a.apply(lambda x: x + 1)
# Call the numba function to precompile it before time measurement
z = numba(x)
start = time.time()
for _ in range(iterations):
z = numba(x)
end = time.time()
print("Elapsed (numba, precompiled) = %s" % (end - start))
start = time.time()
for _ in range(iterations):
z = vectorized(x)
end = time.time()
print("Elapsed (vectorized) = %s" % (end - start))
```

Resulting in the following performance comparison.

This is again tight, but the lambda approach is still a bit faster.

Remember, this is a simple lambda function and we cannot conclude that lambda function in general are faster than using Numba.

## Conclusion

Learnings since the last tutorial is that we have found an example where simple vectorization is slower than Numba. This still leads to the conclusion that performance highly depends on the task. Further, the lambda function seems to give promising performance. Again, this should be compared to the slow approach of a Python for-loop without Numba just-in-time compiled machine code.

I believe numba is slower because of the numpy.zeros call inside the function, which puts into disadvantage compared to the vectorized version. In both cases where numba is slower, you can improve the results for it by returning the result “in place”.

In the lambda case this is trivial, with a+=1, in the where case you could also replace the a with the result of the comparison (as 0. or 1.) or you could use a boolean type that does not require as much memory.

Hi I.,

Good question.

I would say, it depends on what you measure.

What the Numba version does is the same as the vectorized version does. It creates a new NumPy array of same size and returns the result in it.

If you look for speed, then you can do it in-place, as you suggest. But then I would argue, that it is not a fair comparison.

Let me know what you think.

Cheers, Rune