Pandas reading excel file with simple multicolumn index

Question

I have an excel file which looks like this

When I read this whith pandas.read_excel pandas returns a df which looks like this:

                                        1998 Unnamed: 1  1999 Unnamed: 3  \
Angélus                                   20        -35    16         au   
Angludet                                  17         au    16         vo   
Arnaud de Jacquemeau                      16         vo    16         vo   
Ausone                                    20        -40    18        -25   
Barde-Haut                                17         au    17         vo

Is there a way to tell pandas about the multicolumn so that the output is either

                                        1998       1998  1999       1999
Angélus                                   20        -35    16         au   
Angludet                                  17         au    16         vo   
Arnaud de Jacquemeau                      16         vo    16         vo   
Ausone                                    20        -40    18        -25   
Barde-Haut                                17         au    17         vo

or

                                               1998            1999
Angélus                                   20        -35    16         au   
Angludet                                  17         au    16         vo   
Arnaud de Jacquemeau                      16         vo    16         vo   
Ausone                                    20        -40    18        -25   
Barde-Haut                                17         au    17         vo

?

Thx Patrik

Stefan · Accepted Answer · 2016-06-22 16:00:46Z

1

You could try:

df.columns = df.columns.to_series().str.replace(r'^Unnamed', np.nan).fillna(method='ffill').tolist()

answered Jun 22, 2016 at 16:00

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Akarsh · Accepted Answer · 2016-06-22 19:24:06Z

1

You would need to create a new column list & then redefine the column names like below :

df.columns = df.columns.astype(str)    
new_columns = [df.columns[i-1]  if df.columns[i].find("Unnamed") >= 0 else df.columns[i] for i in range(len(df.columns))]
df.columns = new_columns

or you could do it in a single line by

df.columns = [df.columns[i-1]  if df.columns[i].find("Unnamed") >= 0 else df.columns[i] for i in range(len(df.columns))]

edited Jun 22, 2016 at 19:24

answered Jun 22, 2016 at 17:41

Akarsh

3895 silver badges15 bronze badges

2 Comments

Pat Over a year ago

Thats a nice one. Thanks

Akarsh Over a year ago

@Pat if this resolves you problem, please mark this as the answer.

Collectives™ on Stack Overflow

Pandas reading excel file with simple multicolumn index

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related