5

I have an excel sheet like this:

Excel sheet

I want to read it with pandas read_excel and I tried this:

df = pd.read_excel("test.xlsx", header=[0,1])

but it throws me this error:

ParserError: Passed header=[0,1] are too many rows for this multi_index of columns

Any suggestions?

2
  • Are you using Merged Cells for Header 1 and Header 2? If yes, try to go without them. Commented May 22, 2018 at 16:51
  • I gotta say it was kind of a let-down after that title when i realized that this question had nothing to do with large black and white bears. Commented May 22, 2018 at 17:14

2 Answers 2

6

If you don't mind massaging the DataFrame after reading the Excel you can try the below two ways:

>>> pd.read_excel("/tmp/sample.xlsx", usecols = "B:F", skiprows=[0])
  header1 Unnamed: 1 Unnamed: 2 header2 Unnamed: 4
0    col1       col2       col3    col4       col5
1       a          0          x       3          d
2       b          1          y       4          e
3       c          2          z       5          f

In above, you'd have to fix the first level of the MultiIndex since header1 and header2 are merged cells

>>> pd.read_excel("/tmp/sample.xlsx", header=[0,1], usecols = "B:F", 
skiprows=[0])
        header1      header2
header1    col1 col2    col3 col4
a             0    x       3    d
b             1    y       4    e
c             2    z       5    f

In above, it got pretty close by skipping the empty row and parsing only columns (B:F) with data. If you notice, the columns got shifted though...

Note Not a clean solution but just wanted to share samples with you in a post rather than a comment

-- Edit based on discussion with OP --

Based on documentation for pandas read_excel, header[1,2] is creating a MultiIndex for your columns. Looks like it determines the labels for the DataFrame depending on what is populated in Column A. Since there's nothing there... the index has a bunch of Nan like so

>>> pd.read_excel("/tmp/sample.xlsx", header=[1,2])
    header1           header2
       col1 col2 col3    col4 col5
NaN       a    0    x       3    d
NaN       b    1    y       4    e
NaN       c    2    z       5    f

Again if you're okay with cleaning up columns and if the first column of the xlsx is always blank... you can drop it like below. Hopefully this is what you're looking for.

>>> pd.read_excel("/tmp/sample.xlsx", header[1,2]).reset_index().drop(['index'], level=0, axis=1)
  header1           header2
     col1 col2 col3    col4 col5
0       a    0    x       3    d
1       b    1    y       4    e
2       c    2    z       5    f
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your suggestion. As you say it is pretty close, but I need the column names to be in the right place. I found trying that this works as expected: df = pd.read_excel("/tmp/sample.xlsx", header=[1,2]).reset_index(drop=True). I don't know exactly why it works with that header parameter.
I think this should do the job pd.read_excel("/tmp/sample.xlsx", header[1,2]).reset_index().drop(['index'], level=0, axis=1)
I've also modified the original post with my interpretation and understanding of the documentation for read_excel's header parameter. Hopefully others can chime in to clarify our understanding.
1

Here is the documentation on the header parameter:

Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

I think the following should work:

df = pd.read_excel("test.xlsx", skiprows=2, usecols='B:F', header=0)

3 Comments

@OP this is a good solution if you're okay with dropping Header 1 and Header 2.
Thanks for your suggestion. But I need Header 1 and Header 2. And what about if I don't know exactly how many columns there are? It can change, so I can't use usecols ='B:F'
@AlexandraEspichán were you able to find solution on this? I am looking for something similar.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.