Exporting a Pandas Data Frame into Excel using openpyxl

Introduction

Exporting data from Pandas DataFrame to Excel is a common task in data analysis and reporting. In this tutorial, you will learn how to export a Pandas DataFrame to an Excel file using the openpyxl library. We will cover various aspects such as giving a name to the Excel workbook, setting a name for the Excel sheet, adjusting column width, changing row height, and exporting the final file. Let’s get started!

Step 1: Installing the Required Libraries:

Before we begin, make sure you have the necessary libraries installed. You can install them using pip, a Python package installer. Open your terminal or command prompt and execute the following command:

pip install pandas openpyxl

Step 2: Importing the Required Libraries:

To start, we need to import the pandas and openpyxl libraries. Add the following lines of code at the beginning of your Python script:

import pandas as pd
from openpyxl import Workbook
from openpyxl.utils import get_column_letter

Step 3: Creating a Pandas DataFrame:

For demonstration purposes, let’s create a simple DataFrame. You can replace this with your own DataFrame or load data from an external source:

data = {
    'Name': ['John', 'Emma', 'Sophia', 'Michael', 'Olivia'],
    'Age': [28, 32, 25, 36, 30],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

Step 4: Creating an Excel Workbook and Sheet:

Next, we’ll create an Excel workbook and a sheet to hold our DataFrame. This allows us to customize various properties of the workbook and sheet. Add the following lines of code:

wb = Workbook()
sheet = wb.active
sheet.title = 'Data'

wb = Workbook(): This line creates a new Workbook object, which represents the Excel file. It initializes an empty workbook that can contain multiple sheets.

sheet = wb.active: This line assigns the active sheet of the Workbook object to the variable sheet. The active sheet is the sheet that is currently visible or selected within the Excel file. By default, when a new Workbook is created, it contains one sheet, and that becomes the active sheet.

sheet.title = 'Data': This line sets the title of the active sheet to ‘Data’. The title is the name given to a sheet and is displayed at the bottom of the Excel window. In this case, the active sheet is named ‘Data’.

Step 5: Giving a Name to the Excel Workbook:

You can provide a name to the Excel workbook using the wb.title attribute. For example, let’s name it “Data Export”:

wb.title = 'Data Export'

Step 6: Setting a Name for the Excel Sheet:

Similarly, you can set a name for the Excel sheet using the sheet.title attribute. Let’s rename it to “Data Sheet”:

sheet.title = 'Data Sheet'

Tip – Creating a new sheet in addition to the active sheet

sheet = wb.create_sheet("simplifyx")

This line of code creates a new Excel worksheet named “simplifyx” within the workbook wb.

By default, when a new workbook is created, it contains one sheet, which is the active sheet. In this case, the active sheet was not modified or renamed, so it retains its default name. The newly created “simplifyx” sheet will appear as the second sheet in the workbook, following the active sheet.

Step 7: Populating the Excel Sheet with Data:

To populate the sheet with the DataFrame, we can use the append method. Add the following code:

for row in df.iterrows():
    sheet.append(row[1].tolist())`

for row in df.iterrows(): This line sets up a loop that iterates over each row in the data frame, df.

The iterrows() function is a method available in Pandas DataFrames that allows you to iterate over the rows of the DataFrame. By using iterrows(), you can access and process each row of the DataFrame individually, typically in a loop. This is useful when you need to perform operations or calculations on each row or extract specific information from the DataFrame row-wise.

Step 8: Adjusting Column Width and Row Height:

To change the column width and row height, we can use the column_dimensions and row_dimensions attributes of the sheet object, respectively. Let’s set the column width to 15 and the row height to 12:

# setting the column width
for col in sheet.columns:
    max_length = 0
    column = col[0].column_letter
    for cell in col:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(cell.value)
        except:
            pass
    adjusted_width = (max_length + 2) * 1.2
    sheet.column_dimensions[column].width = adjusted_width

# setting the row height
sheet.row_dimensions[1].height = 12

for col in sheet.columns: — This starts a loop that iterates over each column in the Excel Sheet. In each iteration, the variable col represents a single column of the sheet,

max_length = 0 the purpose of this line of code is to keep track of the maximum length of values in the column.

column = col[0].column_letter — This line gets the letter representation of the current column, which will be used to set the column width later.

for cell in col — This line starts a nested loop that iterates over each cell in the current column.

if len(str(cell.value)) > max_length: — This if statement checks if the length of the current cell’s value is greater than the previously recorded maximum length (max_length). If it is, then the maximum length is set to the length of the current cell.

  • adjusted_width = (max_length + 2) * 1.2: This line calculates the adjusted width for the column based on the maximum length of values plus a small buffer of 2 characters and a scaling factor of 1.2.
  • sheet.column_dimensions[column].width = adjusted_width: This line sets the width of the column (column) in the Excel sheet (sheet) to the calculated adjusted width.
  • sheet.row_dimensions[1].height = 12: This line sets the height of the first row in the Excel sheet to a specific value of 12.

Step 9: Exporting the Excel File:

Finally, we are ready to export the DataFrame to an Excel file. Choose a filename and add the following code:

filename = 'data_export.xlsx'
wb.save(filename)

Final Code

import pandas as pd
from openpyxl import Workbook
from openpyxl.utils import get_column_letter

# Creating a Pandas DataFrame
data = {
    'Name': ['John', 'Emma', 'Sophia', 'Michael', 'Olivia'],
    'Age': [28, 32, 25, 36, 30],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# Creating an Excel Workbook and Sheet
wb = Workbook()
sheet = wb.active
sheet.title = 'Data Sheet'

# Giving a Name to the Excel Workbook
wb.title = 'Data Export'

# Populating the Excel Sheet with Data
for row in df.iterrows():
    sheet.append(row[1].tolist())

# Adjusting Column Width and Row Height
for col in sheet.columns:
    max_length = 0
    column = col[0].column_letter
    for cell in col:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(cell.value)
        except:
            pass
    adjusted_width = (max_length + 2) * 1.2
    sheet.column_dimensions[column].width = adjusted_width

sheet.row_dimensions[1].height = 12

# Exporting the Excel File
filename = 'data_export.xlsx'
wb.save(filename)

Conclusion

In this tutorial, we have covered the process of exporting a Pandas DataFrame to an Excel file using the openpyxl library. We learned how to give a name to the Excel workbook, set a name for the Excel sheet, adjust column width, change row height, and export the file. This serves as a solid foundation for your data export needs.

In future tutorials, we will dive deeper into additional formatting and customizations that can be applied to the exported Excel file. These enhancements may include styling cells, applying conditional formatting, adding charts, and much more. Stay tuned for more advanced tutorials that will enable you to create professional and visually appealing Excel exports.

By mastering these techniques, you will have full control over the appearance and structure of your exported data, ensuring that it is presented in a clear and informative manner. Experiment with different options and explore the openpyxl documentation for further possibilities.

Now you are equipped with the knowledge to export your Pandas DataFrames to Excel with ease. Start using this valuable skill in your data analysis workflows and enhance your reporting capabilities. Happy coding!

Leave a Comment