Pandas make row blank if header does not exist

Question

I am trying to combine multiple excel files with Python Pandas. Some files have different headers from each other:

This is where it fails:

# Turn them into dataframes using pandas
frames = []
for excel in excels:
  frame = excel.parse(excel.sheet_names[0],index_col=None)
  frames.append(frame[['Charges', 'Amount','Taxes','Date','Discount Percent', 'Zipcode', 'Order Number']])

KeyError: "['Charges', 'Zipcode', 'Discount Percent'] not in index"

One excel file might have a header but another doesn't and this part of the code fails, how can I make it so if it encounters a header that is not present to just keep going or make the field blank?

The entire script: concat.py

import pandas as pd
import os

excel_path = "C:\\Users\\khernandez\\Desktop\\compare-and-concat\\raw\\"
# File names to join
excel_names = [excel_path + f for f in os.listdir('./raw')]

excels = []
for name in excel_names:
  print("Loading File: " + name)
  excels.append(pd.ExcelFile(name))

# Turn them into dataframes using pandas
frames = []
for excel in excels:
  print("Converting to data frame")
  print(excel)
  frame = excel.parse(excel.sheet_names[0],index_col=None)
  frames.append(frame[['Charges', 'Amount','Taxes','Date','Discount Percent', 'Zipcode', 'Order Number']])


# # Delete the first row of the excel file
# print("Removing HEADERS")
# frames[1:] = [df[1:] for df in frames[1:]]

# Combine the dataframes
print("Combining frames")
combined = pd.concat(frames)


# Write them out to a file named concated.xlsx
combined.to_excel("concated.xlsx", header=True, index=False)

Code Different · Accepted Answer · 2019-12-12 19:48:09Z

1

Typing this in the blind and not fully tested.

You have a fixed set of columns to extract from source Excel files. Use intersection to get only those that exist, then index to add back the missing columns (if any):

frames = []
cols = ['Charges', 'Amount','Taxes','Date','Discount Percent', 'Zipcode', 'Order Number']
for excel in excels:
    ...
    frames.append(frame[np.intersect1d(cols, frame.columns.values)])

combined = pd.concat(frames, sort=False, ignore_index=True) \
                .reindex(cols, axis=0)

edited Dec 12, 2019 at 19:48

answered Dec 12, 2019 at 19:39

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Kevin H Over a year ago

Mirrored your code into mines, AttributeError: 'Series' object has no attribute 'intersection' I looked up the documentation. It seems to be right why isnt it working then?

Code Different Over a year ago

Sorry, my failure at reading documentation. Changed to np.intersect1d

Kevin H Over a year ago

The script runs but only prints out the headers to my excel file, maybe something in pd.concat()?

Kevin H Over a year ago

I just removed everything but sort=False and it worked!

Code Different Over a year ago

Glad it worked for you. I though it was the axis value in reindex

Collectives™ on Stack Overflow

Pandas make row blank if header does not exist

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related