0

I would like to find if all the required column names are present in the excel using python. for example:

Header1 Header2 Header3
Val1    Val4    Val6
Val2    val5    Val7

I want to know if header4 is present or not

I use the following:

import pandas as pd
path=C:\Req_file\excel_file

xl = pd.Excelfile(path)

for name in xl.sheet_names:
    df = pd.read_excel(xl, name)
my_cols = [Header1, Header2,Header3,Header4]
print(df[my_cols])

It generates a

Keyerror: [header4] not in index

I would like to know is it possible to do with "if" statement. I want to generate an error message on the frame, but I get only in the terminal.

Thanks a lot in advance.

4
  • df.columns will list the names of the column headers: you can test if your column of interest is present: if "my_column" in df.columns:. Commented Oct 9, 2018 at 9:34
  • 1
    If you are trying to generate an error message if the header is missing, it is more pythonic to use try: over if:. blogs.msdn.microsoft.com/pythonengineering/2016/06/29/… Commented Oct 9, 2018 at 11:05
  • @Dan: absolutely true, with the caveat that if you have to do this for a dozen required columns, with optional intermediate code in between, there is no single point now where the existence is checked. Unless you perform a no-op try-except: try: df[required_columns]; except KeyError:. Potentially better is even to just let the KeyError bubble up to the user. Commented Oct 9, 2018 at 11:21
  • try: df[required_columns]; except KeyError: looks correct to me. I like the set solution you posted, but in this instance your try: code makes more sense to me. I think you should add it to your answer maybe. Commented Oct 9, 2018 at 11:30

2 Answers 2

2

If you want to check that all required column headers are present, you can use sets, and use the columns attribute of a dataframe:

if set(required_columns) <= set(df.columns):
    print("all required columns are there")

If you need to find the missing required columns, use the set difference, with the required columns first (so that additional columns are ignored):

missing = set(required_columns) - set(df.columns)

and combine the two as follows:

missing = set(required_columns) - set(df.columns)
if missing:
    print("Missing required columns:", missing)
Sign up to request clarification or add additional context in comments.

3 Comments

Is there any way that I can print the missing column header from the my_columns which are not present in df.columns?
Probably print(set(my_columns) - set(df.columns)) will do.
Thank u all. I got want I need.
0

link this:

In [5]: data=pd.DataFrame([["Abao","man"],["Tom","man"]],columns=["name","sex"])

In [6]: data
Out[6]: 
   name  sex
0  Abao  man
1   Tom  man

In [7]: data.columns 
Out[7]: Index(['name', 'sex'], dtype='object')

In [8]: "age" in data.columns
Out[8]: False

In [9]: "sex" in data.columns
Out[9]: True    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.