Remove header row in Excel using pandas

Question

I have an Excel file with merge header that I read as dataframe using pandas. It looks like this after pd.read_excel():

Unnamed: 0     Pair    Unnamed: 1      Type      ...  Unnamed: 23
cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

So it's like I have two header rows. One is the row with Unnamed and the other is my desired header row.

This is my desired output:

cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

I am trying to remove the row with Unnamed:

df.drop(df.index[[0]])

and also using header=None in pd.read_excel('file.xlsx, header=None)'

But all of what I found did not return my expected output. I searched on how to delete rows with Unnamed but all I found was deleting columns.

I also tried

df.drop(df.head(0))

but it returned me:

KeyError: '[\'Unnamed: 0\' \'Pair'\ ... \'Unnamed: 23\']'

Any best way to do it?

How working pd.read_excel(file, header=[1]) ?

jezrael
– jezrael

2018-10-08 10:19:06 +00:00
Commented Oct 8, 2018 at 10:19 — jezrael
– jezrael, Commented Oct 8, 2018 at 10:19
@jezrael the same KeyError as df.drop(df.head(0))

Ricky Aguilar
– Ricky Aguilar

2018-10-08 10:20:35 +00:00
Commented Oct 8, 2018 at 10:20 — Ricky Aguilar
– Ricky Aguilar, Commented Oct 8, 2018 at 10:20
I think df.drop(df.head(0)) id not necessary

jezrael
– jezrael

2018-10-08 10:22:27 +00:00
Commented Oct 8, 2018 at 10:22 — jezrael
– jezrael, Commented Oct 8, 2018 at 10:22

jezrael · Accepted Answer · 2018-10-08 10:52:32Z

6

I believe you need skip first row by parameters skiprows=1 or header=1 and then remove all only NaNs columns:

df = (pd.read_excel('UF_AGT702-M.xlsx', skiprows=2, sheetname='Report')
        .dropna(how='all', axis=1))

edited Oct 8, 2018 at 10:52

answered Oct 8, 2018 at 10:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ricky Aguilar Over a year ago

Both did not work. It's wierd. I used print(df.head(0)) to view the headers. And I also used to_excel() to view the output as Excel file.

jezrael Over a year ago

@RickyAguilar - Are data confidental?

Ricky Aguilar Over a year ago

Nope. Not confidentia. The values changes everyday.

jezrael Over a year ago

@RickyAguilar - Super, is possible send me file to my email from my profile? Because it seems some data related problem.

Ricky Aguilar Over a year ago

Sent. I think you will get what i am trying to do once you see the file.

Miguel Rueda · Accepted Answer · 2019-11-13 10:35:00Z

3

Let's take for instance the excel file layout bellow.

To exclude the footer and header information from the datafile you could use the header/skiprows parameter for the former and skipfooter for the later. Here is a MWE for its use it:

import pandas as pd

energy = pd.read_excel('your_excel_file.xls', header=9, skipfooter=8)

header : int, list of int, default 0 Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

skipfooter : list-like Rows at the end to skip (0-indexed).

Check out latest read_excel documentation for further details.

answered Nov 13, 2019 at 10:35

Miguel Rueda

5241 gold badge6 silver badges16 bronze badges

1 Comment

Murtaza Mohsin Over a year ago

Hi, this is helpful, but how would you remove columns if let's say your dataset starts from column B or C?

Collectives™ on Stack Overflow

Remove header row in Excel using pandas

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related