NumPy vs Pandas

What will we cover in this tutorial

A high level view of the differences of NumPy and Pandas libraries in Python. We will also make a short exploration of the performance differences in a specific use case.

Top level differences between NumPy and Pandas

First of all, the purpose of these libraries are different.

  • NumPy is made to manage n-dimensional numerical data. Think of it if you need to handle a lot of data all of the same type and numerical, but categorized in columns and rows.
  • Pandas is made for tabular data. This could be data from an excel sheet, where you have various types of data categorized in rows and columns.

There are more differences.

  • NumPy consist of the data type ndarray, which is create with fixed dimensions with only one element type.
  • Pandas consist of Series and DataFrames, which are more dynamic after creation.

Performance comparison of NumPy and Pandas

If you should guess? Pandas? Of course not. NumPy is great magnitude faster than Pandas.


Let us first examine it.

import time
import numpy as np
import pandas as pd
size = 100
iterations = 100000000//size
a = np.arange(size)
start = time.time()
for _ in range(iterations): a2 = a * a
end = time.time()
print(end - start)
n = pd.Series(a)
start = time.time()
for _ in range(iterations): n2 = n * n
end = time.time()
print(end - start)

Which results in the following comparison.

NumPy vs Pandas

I find it very interesting that the speed is so slow for small instances of Pandas, comparing to NumPy, while later it seems to go to Pandas advantage, but eventually it still seems to be NumPy.

Well, the flexibility of Pandas has a cost, which is high for small instances when making arithmetic operations as we did in the above example.

Next steps

Investigate further how NumPy and Pandas compare in performance for various functions.

Pandas and NumPy support a lot of functions in a vectorized way, which could be interesting to investigate. Do the restrictions of NumPy arrays give the underlying C/C++ code an advantage in performance?