How to Concatenate Multiple CSV Files and Export to Excel from Python

What will we cover in this tutorial?

In this tutorial we will show how you can concatenate CSV files of the same format and export it to an Excel sheet. All done in Python by using Pandas.

This is something you want in your Python toolbox as it will make it easy for you to get quick wins helping people with manual work.

Step 0: Preparation

We need multiple CSV files in this tutorial. If you have CSV files with the same format, then you can skip to the next step.

First, let’s create some random data by using the power of NumPy. This can be done by using the uniform method in the NumPy random package.

import numpy as np
import pandas as pd


for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.DataFrame(np.random.uniform(0, 100, size=(100, 4)), columns=list('ABCD'))
    df.to_csv(file_name)

As you see, we use the nice way to generate the random data by calling the random.uniform, which takes the lower bound and upper bound of the uniform interval to get numbers from, and the size parameter is the dimension of the NumPy array. Here it generates 100 rows in 4 columns.

The DataFrame constructor can convert the NumPy array to the DataFrame and setting the column names to A, B, C, and D, respectively. This is done by converting the the string ‘ABCD‘ to a list, resulting in [‘A’, ‘B’, ‘C’, ‘D’].

The content is stored to a csv file by simply calling to_csv on the DataFrame object.

The files will look similar to this (only the beginning of one here).

,A,B,C,D
0,59.74500080886668,59.63123820144265,47.53898343676622,50.31013442167491
1,31.00881083223196,30.91970178258706,93.2538048567809,41.95421609366061
2,32.03557607643186,96.34180274801857,70.8926221704894,90.79417905937711
3,22.44980451436659,61.25077305144971,24.013823071196082,31.401361361506062
4,71.03827853534764,20.527320250264925,29.379395377514662,94.89076442880472
5,76.1462091142596,20.509207361552917,66.57572207423806,69.38906836001392

Step 1: Read a CSV file with Pandas

Now it is time for the fun stuff. Read one csv file into a Pandas DataFrame.

This can be done by calling the read_csv method in the Pandas library.

import pandas as pd


df = pd.read_csv("data_0.csv")
print(df)

This gives the following output.

    Unnamed: 0          A          B          C          D
0            0  59.745001  59.631238  47.538983  50.310134
1            1  31.008811  30.919702  93.253805  41.954216
2            2  32.035576  96.341803  70.892622  90.794179
3            3  22.449805  61.250773  24.013823  31.401361
4            4  71.038279  20.527320  29.379395  94.890764

Or the beginning of it.

Now we are ready to read all the content files and concatenate them.

Step 2: Concatenate Multiple DataFrames into One DataFrame

First we need to read all the csv files. This can be done in many ways. Here we will iterate over them and append them to a list.

After that we will concatenate them all.

This can be done by using the concat method in the Pandas library. The concat function takes an iterable as the the first argument. That is great, because a list is an iterable.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list)
print(df_concat)

Which results in a single DataFrame with 300 rows.

    Unnamed: 0          A          B          C          D
0            0  59.745001  59.631238  47.538983  50.310134
1            1  31.008811  30.919702  93.253805  41.954216
..         ...        ...        ...        ...        ...
98          98  65.831197  47.070415  65.205620  64.701177
99          99  27.259875  97.514085   8.444346  65.991040

[300 rows x 5 columns]

Notice that the index is reset for each concatenation. We can handle that by adding a ignore_index=True in the concat call.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list, ignore_index=True)
print(df_concat)

Which results in the following result.

     Unnamed: 0          A          B          C          D
0             0  59.745001  59.631238  47.538983  50.310134
1             1  31.008811  30.919702  93.253805  41.954216
..          ...        ...        ...        ...        ...
298          98  65.831197  47.070415  65.205620  64.701177
299          99  27.259875  97.514085   8.444346  65.991040

[300 rows x 5 columns]

Now ending at 299 rows.

Step 3: Write the output to an Excel file

This is simply done by using the to_excel method on the DataFrame object. It takes the file name as argument, and it should be of the format xlsx.

Notice, some installations of Pandas do not include the XlsxWriter module. Then you need to install that. You will know when you run the code below, it will say it is missing. Then you can install it by using the pip.

pip install xlsxwriter

Or if you are in PyCharm, you can simply write import xlsxwriter on a line and hover over the red line of xlsxwriter and select install. Then it should install it for you.

Now to the full code.

import pandas as pd


df_list = []
for file_name in ['data_0.csv', 'data_1.csv', 'data_2.csv']:
    df = pd.read_csv("data_0.csv")
    df_list.append(df)

df_concat = pd.concat(df_list, ignore_index=True)
df_concat.to_excel("output.xlsx")

This results in an Excel file similar to this.

Resulting output.

Leave a Reply