I am trying to combine multiple excel files with Python Pandas. Some files have different headers from each other:
Similar question on stackoverflow here
This is where it fails:
# Turn them into dataframes using pandas
frames = []
for excel in excels:
frame = excel.parse(excel.sheet_names[0],index_col=None)
frames.append(frame[['Charges', 'Amount','Taxes','Date','Discount Percent', 'Zipcode', 'Order Number']])
KeyError: "['Charges', 'Zipcode', 'Discount Percent'] not in index"
One excel file might have a header but another doesn't and this part of the code fails, how can I make it so if it encounters a header that is not present to just keep going or make the field blank?
The entire script: concat.py
import pandas as pd
import os
excel_path = "C:\\Users\\khernandez\\Desktop\\compare-and-concat\\raw\\"
# File names to join
excel_names = [excel_path + f for f in os.listdir('./raw')]
excels = []
for name in excel_names:
print("Loading File: " + name)
excels.append(pd.ExcelFile(name))
# Turn them into dataframes using pandas
frames = []
for excel in excels:
print("Converting to data frame")
print(excel)
frame = excel.parse(excel.sheet_names[0],index_col=None)
frames.append(frame[['Charges', 'Amount','Taxes','Date','Discount Percent', 'Zipcode', 'Order Number']])
# # Delete the first row of the excel file
# print("Removing HEADERS")
# frames[1:] = [df[1:] for df in frames[1:]]
# Combine the dataframes
print("Combining frames")
combined = pd.concat(frames)
# Write them out to a file named concated.xlsx
combined.to_excel("concated.xlsx", header=True, index=False)