2

I have an empty DataFrame:

import pandas as pd
df = pd.DataFrame()

I want to add a hierarchically-named column. I tried this:

df['foo', 'bar'] = [1,2,3]

But it gives a column whose name is a tuple:

   (foo, bar)
0           1
1           2
2           3

I want this:

  foo
  bar
0   1
1   2
2   3

Which I can get if I construct a brand new DataFrame this way:

pd.DataFrame([1,2,3], columns=pd.MultiIndex.from_tuples([('foo', 'bar')]))

How can I create such a layout when adding new columns to an existing DataFrame? The number of levels is always 2...and I know all the possible values for the first level in advance.

3
  • Is this a duplicate of?stackoverflow.com/questions/17985159/… Commented Nov 3, 2016 at 3:13
  • You're obviously a well regarded user though. I may be missing something. I'll pop it in answer form if it is a separate question to above. Commented Nov 3, 2016 at 3:21
  • @AER: That question asks how to add an additional level onto existing columns. I want to add an additional column with its own levels. In other words, I know how to make the final result I want if I construct a DataFrame from scratch, but I am trying to figure out how to do it by building it up one column at a time (a common technique when using single-level column names). Commented Nov 3, 2016 at 3:23

2 Answers 2

2

If you are looking to build the multi-index DF one column at a time, you could append the frames and drop the Nan's introduced leaving you with the desired multi-index DF as shown:

Demo:

df = pd.DataFrame()
df['foo', 'bar'] = [1,2,3]
df['foo', 'baz'] = [3,4,5]
df

Image

Taking one column at a time and build the corresponding headers.

pd.concat([df[[0]], df[[1]]]).apply(lambda x: x.dropna())

Image

Due to the Nans produced, the values are typecasted into float dtype which could be re-casted back to integers with the help of DF.astype(int).

Note:

This assumes that the number of levels are matching during concatenation.

Sign up to request clarification or add additional context in comments.

Comments

0

I'm not sure there is a way to get away with this without redefining the index of the columns to be a Multiindex. If I am not mistaken the levels of the MultiIndex class are actually made up of Index objects. While you can have DataFrames with Hierarchical indices that do not have values for one or more of the levels the index object itself still must be a MultiIndex. For example:

In [2]: df = pd.DataFrame({'foo': [1,2,3], 'bar': [4,5,6]})

In [3]: df
Out[3]:
   bar  foo
0    4    1
1    5    2
2    6    3

In [4]: df.columns
Out[4]: Index([u'bar', u'foo'], dtype='object')

In [5]: df.columns = pd.MultiIndex.from_tuples([('', 'foo'), ('foo','bar')])

In [6]: df.columns
Out[6]:
MultiIndex(levels=[[u'', u'foo'], [u'bar', u'foo']], 
           labels=[[0, 1], [1, 0]])

In [7]: df.columns.get_level_values(0)
Out[7]: Index([u'', u'foo'], dtype='object')

In [8]: df
Out[8]:
      foo
  foo bar
0   4   1
1   5   2
2   6   3

In [9]: df['bar', 'baz'] = [7,8,9]

In [10]: df
Out[10]:
      foo bar
  foo bar baz
0   4   1   7
1   5   2   8
2   6   3   9

So as you can see, once the MultiIndex is in place you can add columns as you thought, but unfortunately I am not aware of any way of coercing the DataFrame to adaptively adopt a MultiIndex.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.