How to add a hierarchically-named column to a Pandas DataFrame

Question

I have an empty DataFrame:

import pandas as pd
df = pd.DataFrame()

I want to add a hierarchically-named column. I tried this:

df['foo', 'bar'] = [1,2,3]

But it gives a column whose name is a tuple:

   (foo, bar)
0           1
1           2
2           3

I want this:

  foo
  bar
0   1
1   2
2   3

Which I can get if I construct a brand new DataFrame this way:

pd.DataFrame([1,2,3], columns=pd.MultiIndex.from_tuples([('foo', 'bar')]))

How can I create such a layout when adding new columns to an existing DataFrame? The number of levels is always 2...and I know all the possible values for the first level in advance.

Is this a duplicate of?stackoverflow.com/questions/17985159/… — AER
– AER, Commented Nov 3, 2016 at 3:13
You're obviously a well regarded user though. I may be missing something. I'll pop it in answer form if it is a separate question to above. — AER
– AER, Commented Nov 3, 2016 at 3:21
@AER: That question asks how to add an additional level onto existing columns. I want to add an additional column with its own levels. In other words, I know how to make the final result I want if I construct a DataFrame from scratch, but I am trying to figure out how to do it by building it up one column at a time (a common technique when using single-level column names). — John Zwinck
– John Zwinck, Commented Nov 3, 2016 at 3:23

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

If you are looking to build the multi-index DF one column at a time, you could append the frames and drop the Nan's introduced leaving you with the desired multi-index DF as shown:

Demo:

df = pd.DataFrame()
df['foo', 'bar'] = [1,2,3]
df['foo', 'baz'] = [3,4,5]
df

Taking one column at a time and build the corresponding headers.

pd.concat([df[[0]], df[[1]]]).apply(lambda x: x.dropna())

Due to the Nans produced, the values are typecasted into float dtype which could be re-casted back to integers with the help of DF.astype(int).

Note:

This assumes that the number of levels are matching during concatenation.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Nov 3, 2016 at 11:51

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Grr · Accepted Answer · 2016-11-03 09:33:44Z

I'm not sure there is a way to get away with this without redefining the index of the columns to be a Multiindex. If I am not mistaken the levels of the MultiIndex class are actually made up of Index objects. While you can have DataFrames with Hierarchical indices that do not have values for one or more of the levels the index object itself still must be a MultiIndex. For example:

In [2]: df = pd.DataFrame({'foo': [1,2,3], 'bar': [4,5,6]})

In [3]: df
Out[3]:
   bar  foo
0    4    1
1    5    2
2    6    3

In [4]: df.columns
Out[4]: Index([u'bar', u'foo'], dtype='object')

In [5]: df.columns = pd.MultiIndex.from_tuples([('', 'foo'), ('foo','bar')])

In [6]: df.columns
Out[6]:
MultiIndex(levels=[[u'', u'foo'], [u'bar', u'foo']], 
           labels=[[0, 1], [1, 0]])

In [7]: df.columns.get_level_values(0)
Out[7]: Index([u'', u'foo'], dtype='object')

In [8]: df
Out[8]:
      foo
  foo bar
0   4   1
1   5   2
2   6   3

In [9]: df['bar', 'baz'] = [7,8,9]

In [10]: df
Out[10]:
      foo bar
  foo bar baz
0   4   1   7
1   5   2   8
2   6   3   9

So as you can see, once the MultiIndex is in place you can add columns as you thought, but unfortunately I am not aware of any way of coercing the DataFrame to adaptively adopt a MultiIndex.

Collectives™ on Stack Overflow

How to add a hierarchically-named column to a Pandas DataFrame

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related