How to Export Pandas DataFrame to Excel and Create a Trendline Graph of Scatter Plot

What will we cover in this tutorial?

We will have some data in a Pandas DataFrame, which we want to export to an Excel sheet. Then we want to create a Scatter plot graph and fit that to a Excel trendline.

Step 1: Get the data

You might have some data already that you want to use. It can be from a HTML page (example) or CSV file.

For this purpose here we just generate some random data to use. We will use NumPy’s uniform function to generate it.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)
print(data)

Which will generate some slightly increasing data, which is nice to fit a graph to.

The output could look something like this.

            A          B
0    0.039515   0.778077
1    0.451888   0.210705
2    0.992493   0.961428
3    0.317536   1.046444
4    1.220419   1.388086

Step 2: Create an Excel XlsxWriter engine

This step might require that you install the XlsxWriter library, which is needed from the Pandas library.

This can be done by the following command.

pip install xlsxwriter

Now we can create the engine in our code.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)

# Create a Pandas Excel writer using XlsxWriter
excel_file = 'output.xlsx'
sheet_name = 'Data set'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')

This will setup a Excel writer engine and be ready to write to file output.xlsx.

Step 3: Write the data to Excel and create a scatter graph with a fitted Trendline

This can be done by the following code, which uses the add_series function to insert a graph.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)

# Create a Pandas Excel writer using XlsxWriter
excel_file = 'output.xlsx'
sheet_name = 'Data set'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
data.to_excel(writer, sheet_name=sheet_name)

# Access the XlsxWriter workbook and worksheet objects from the dataframe.
workbook = writer.book
worksheet = writer.sheets[sheet_name]

# Create a scatter chart object.
chart = workbook.add_chart({'type': 'scatter'})

# Get the number of rows and column index
max_row = len(data)
col_x = data.columns.get_loc('A') + 1
col_y = data.columns.get_loc('B') + 1

# Create the scatter plot, use a trendline to fit it
chart.add_series({
    'name':       "Samples",
    'categories': [sheet_name, 1, col_x, max_row, col_x],
    'values':     [sheet_name, 1, col_y, max_row, col_y],
    'marker':     {'type': 'circle', 'size': 4},
    'trendline': {'type': 'linear'},
})

# Set name on axis
chart.set_x_axis({'name': 'Concentration'})
chart.set_y_axis({'name': 'Measured',
                  'major_gridlines': {'visible': False}})

# Insert the chart into the worksheet in field D2
worksheet.insert_chart('D2', chart)

# Close and save the Excel file
writer.save()

Result

The result should be similar to this.

The resulting Excel sheet.

That is how it can be done.

Leave a Reply