1

I am trying to concatenate 2 dataframe df1 and df2 df1 is a multiindex dataframe and df2 has less rows than df1

import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df1 = pd.DataFrame(np.random.randn(8), index=index)

df1
Out[15]: 
                     0
first second          
bar   one    -0.185560
      two    -2.358254
baz   one     1.130550
      two     1.441708
foo   one    -1.163076
      two     1.776814
qux   one    -0.811836
      two     0.389500

df2 = pd.DataFrame(data=[0,1,0,1],index=['bar','baz','foo', 'qux'],columns=['label'])

df2
Out[18]: 
     label
bar      0
baz      1
foo      0
qux      1

The desired result would be something like:

df3
Out[18]: 
                     0      label
first second          
bar   one    -0.185560          0
      two    -2.358254          0
baz   one     1.130550          1
      two     1.441708          1
foo   one    -1.163076          0
      two     1.776814          0
qux   one    -0.811836          1
      two     0.389500          1

2 Answers 2

2

Another method is to just reset_index on the second level, you can then just add the column which will align on the first level index values, and then set the index back again:

In[52]:
df3 = df1.reset_index(level=1)
df3['label'] = df2['label']
df3 = df3.set_index([df3.index, 'second'])
df3

Out[52]: 
                     0  label
first second                 
bar   one     0.957417      0
      two    -0.466755      0
baz   one     1.064326      1
      two     1.036983      1
foo   one    -1.319737      0
      two     0.064465      0
qux   one    -0.237232      1
      two    -0.511889      1
Sign up to request clarification or add additional context in comments.

Comments

2
In [132]: df1['label'] = df1.index.get_level_values(0).to_series().map(df2['label']).values

In [133]: df1
Out[133]:
                     0  label
first second
bar   one     0.143211      0
      two     1.133454      0
baz   one     1.298973      1
      two    -0.717844      1
foo   one    -0.663768      0
      two     0.687015      0
qux   one     0.412729      1
      two     0.366502      1

or a better option (thanks to @Dark for the hint):

df1['label'] = df1.index.get_level_values(0).map(df2['label'].get)

4 Comments

Another cool way than to_series is creating a callable i.e df1.index.get_level_values(0).map(df2['label'].get)
@Dark, that is interesting - thank you! I didn't know this way
I learnt from pirSquared in one of the answers, and Im Bharath just changed the name : )
@Dark (Bharath), yeah, nice to see you again :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.