NumPy vs Pandas

What will we cover in this tutorial

A high level view of the differences of NumPy and Pandas libraries in Python. We will also make a short exploration of the performance differences in a specific use case.

Top level differences between NumPy and Pandas

First of all, the purpose of these libraries are different.

• NumPy is made to manage n-dimensional numerical data. Think of it if you need to handle a lot of data all of the same type and numerical, but categorized in columns and rows.
• Pandas is made for tabular data. This could be data from an excel sheet, where you have various types of data categorized in rows and columns.

There are more differences.

• NumPy consist of the data type ndarray, which is create with fixed dimensions with only one element type.
• Pandas consist of Series and DataFrames, which are more dynamic after creation.

Performance comparison of NumPy and Pandas

If you should guess? Pandas? Of course not. NumPy is great magnitude faster than Pandas.

Why?

Let us first examine it.

import time
import numpy as np
import pandas as pd

size = 100
iterations = 100000000//size

a = np.arange(size)
start = time.time()
for _ in range(iterations): a2 = a * a
end = time.time()
print(end - start)

n = pd.Series(a)
start = time.time()
for _ in range(iterations): n2 = n * n
end = time.time()
print(end - start)

Which results in the following comparison.