How to do join of multiindex dataframe with a single index dataframe?

Question

The single index of df1 matches with a sublevel of multiindex of df2. Both have the same columns. I want to copy all rows and columns of df1 to df2.

It is similar to this thread: copying a single-index DataFrame into a MultiIndex DataFrame

But that solution only work for one index value, the index 'a' in that case. I want to do this operation for all index of df1.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import itertools
In [4]: inner = ('a','b')
In [5]: outer = ((10,20), (1,2))
In [6]: cols = ('one','two','three','four')
In [7]: sngl = pd.DataFrame(np.random.randn(2,4), index=inner, columns=cols)
In [8]: index_tups = list(itertools.product(*(outer + (inner,))))
In [9]: index_mult = pd.MultiIndex.from_tuples(index_tups)
In [10]: mult = pd.DataFrame(index=index_mult, columns=cols)
In [11]: sngl
Out[11]: 
        one       two     three      four
a  2.946876 -0.751171  2.306766  0.323146
b  0.192558  0.928031  1.230475 -0.256739

In [12]: mult
Out[12]: 
        one  two three four
10 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
20 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN


In [13]: mult.ix[(10,1)] = sngl

In [14]: mult
Out[14]: 
        one  two three four
10 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
20 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN

The solution given by @Jeff is

nm = mult.reset_index().set_index('level_2')
nm.loc['a',sngl.columns] = sngl.loc['a'].values

         level_0  level_1        one        two     three        four
level_2                                                              
a             10        1  0.3738456 -0.2261926 -1.205177  0.08448757
b             10        1        NaN        NaN       NaN         NaN
a             10        2  0.3738456 -0.2261926 -1.205177  0.08448757
b             10        2        NaN        NaN       NaN         NaN
a             20        1  0.3738456 -0.2261926 -1.205177  0.08448757
b             20        1        NaN        NaN       NaN         NaN
a             20        2  0.3738456 -0.2261926 -1.205177  0.08448757
b             20        2        NaN        NaN       NaN         NaN

I can't do this:

nm.loc[:,sngl.columns] = sngl.loc[:].values

It will raise ValueError: "cannot copy sequence with size X to array axis with dimension Y"

I am currently using a loop. But this is not the pandas way.

DSM · Accepted Answer · 2015-09-23 19:39:44Z

1

This feels a little too manual, but in practice I might do something like this:

In [46]: mult[:] = sngl.loc[mult.index.get_level_values(2)].values

In [47]: mult
Out[47]: 
             one       two     three      four
10 1 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
   2 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
20 1 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
   2 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149

That is, first select the elements we want to use to index:

In [64]: mult.index.get_level_values(2)
Out[64]: Index(['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b'], dtype='object')

Then use these to index into sngl:

In [65]: sngl.loc[mult.index.get_level_values(2)]
Out[65]: 
        one       two     three      four
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149

and then we can use .values to throw away the indexing information and just get the raw array to fill with.

It's not very elegant, but it's straightforward.

answered Sep 23, 2015 at 19:39

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rex Over a year ago

cool! What if add an additional hierarchical index to both sng1 and mult? mult.index.get_level_values(2) can only work for one level. sng2=pd.concat([sng1,sng1],keys=['X','Y']), mult2=pd.concat([mult,mult],keys=['X','Y']). To make this thread clear, I posted a new one: stackoverflow.com/questions/32748910/…

Collectives™ on Stack Overflow

How to do join of multiindex dataframe with a single index dataframe?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related