4

I have an Excel file with merge header that I read as dataframe using pandas. It looks like this after pd.read_excel():

Unnamed: 0     Pair    Unnamed: 1      Type      ...  Unnamed: 23
cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

So it's like I have two header rows. One is the row with Unnamed and the other is my desired header row.

This is my desired output:

cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

I am trying to remove the row with Unnamed:

df.drop(df.index[[0]])

and also using header=None in pd.read_excel('file.xlsx, header=None)'

But all of what I found did not return my expected output. I searched on how to delete rows with Unnamed but all I found was deleting columns.

I also tried

df.drop(df.head(0))

but it returned me:

KeyError: '[\'Unnamed: 0\' \'Pair'\ ... \'Unnamed: 23\']'

Any best way to do it?

3
  • How working pd.read_excel(file, header=[1]) ? Commented Oct 8, 2018 at 10:19
  • 1
    @jezrael the same KeyError as df.drop(df.head(0)) Commented Oct 8, 2018 at 10:20
  • I think df.drop(df.head(0)) id not necessary Commented Oct 8, 2018 at 10:22

2 Answers 2

6

I believe you need skip first row by parameters skiprows=1 or header=1 and then remove all only NaNs columns:

df = (pd.read_excel('UF_AGT702-M.xlsx', skiprows=2, sheetname='Report')
        .dropna(how='all', axis=1))
Sign up to request clarification or add additional context in comments.

5 Comments

Both did not work. It's wierd. I used print(df.head(0)) to view the headers. And I also used to_excel() to view the output as Excel file.
@RickyAguilar - Are data confidental?
Nope. Not confidentia. The values changes everyday.
@RickyAguilar - Super, is possible send me file to my email from my profile? Because it seems some data related problem.
Sent. I think you will get what i am trying to do once you see the file.
3

Let's take for instance the excel file layout bellow.

enter image description here

To exclude the footer and header information from the datafile you could use the header/skiprows parameter for the former and skipfooter for the later. Here is a MWE for its use it:

import pandas as pd

energy = pd.read_excel('your_excel_file.xls', header=9, skipfooter=8)

header : int, list of int, default 0 Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

skipfooter : list-like Rows at the end to skip (0-indexed).

Check out latest read_excel documentation for further details.

1 Comment

Hi, this is helpful, but how would you remove columns if let's say your dataset starts from column B or C?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.