Dealing with data scattered in hundreds of files can be a daunting task. To unravel insights and comprehensive analytics, merging these files into one collective dataset is crucial. Today, we’ll discover an easy way to consolidate multiple Excel workbooks into one workbook using Python. It’s important to note that for this process to run smoothly, your Excel workbooks should share similar column structures.
The Challenge at Hand:
Let’s imagine overseeing the sales records of “GadgetGalaxy,” a fictional tech store, across three days. Each day’s sales data sits in its own Excel file. Our mission? Consolidate these files into one to analyze all sales seamlessly.
Tools We’ll Use:
Firstly, ensure you have Python installed. Then, get Pandas ready. These are the tools we’ll use to merge our data.
We’ll use Python’s os
module to navigate through folders and find our Excel files. This method helps automate the process, no matter how many files there are.
With Pandas’ pd.read_excel()
and pd.concat()
functions, we’ll read each Excel file’s contents and merge them into a single sheet. This merged sheet becomes our complete dataset for analysis.
Let’s Apply It: GadgetGalaxy’s Sales
In our example, we’ll merge “GadgetGalaxy” sales from three days: Day 1, Day 2, and Day 3. These sales data, each in its own Excel file, will be combined to create a single dataset for a full picture of sales trends.
import pandas as pd
import os as os
input_file_path = "C:/Users/rc_ab/Desktop/Python Projects/Codes for blogs/sales data/"
output_path = "C:/Users/rc_ab/Desktop/Python Projects/Codes for blogs/"
# initialize an empty dataframe
excel_files_list = os.listdir(input_file_path)
df = pd.DataFrame()
for excel_files in excel_files_list:
if excel_files.endswith(".xlsx"):
temp_df = pd.read_excel(input_file_path + excel_files)
df = pd.concat([df, temp_df], ignore_index = True)
# exporting the output as an excel workbook
df.to_excel(output_path + 'combined_sales_data.xlsx', index = False)