Dataframe columns from Dataframe rows in Pandas

Question

I have two Pandas DataFrames. I would like to add the rows of the other dataframe as columns in the other. I've tried reading through the Merge, join, and concatenate - documentation, but can't get my head around how to do this in Pandas.

Here's how I've managed to do it with converting to numpy arrays, but surely there is a smart way to do this in Pandas.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,3,4],columns=['a','b'])
df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])
ar = np.concatenate((df1.values,df2.values.T),axis=1)
df = pd.DataFrame(ar,columns=['a','b','c','d'],index=[1,2,3,4])

unutbu · Accepted Answer · 2014-11-21 17:58:09Z

If df1.index has no duplicate values, then you could use df1.join:

In [283]: df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,3,4],columns=['a','b'])

In [284]: df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])

In [285]: df1.join(df2.T.set_index(df1.index))
Out[285]: 
          a         b         c         d
1 -1.196281  0.222283  1.247750 -0.121309
2  1.188098  0.384871 -1.324419 -1.610255
3 -0.928642 -0.618491  0.171215 -1.545479
4 -0.832756 -0.491364  0.100428 -0.525689

If df1 has duplicate entries in its index, then df1.join(...) may return more rows than desired. For example, if df1 has non-unique index [1,2,1,4] then:

In [4]: df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,1,4],columns=['a','b'])

In [5]: df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])

In [8]: df1.join(df2.T.set_index(df1.index))
Out[8]: 
          a         b         c         d
1 -1.087152 -0.828800 -1.129768 -0.579428
1 -1.087152 -0.828800  0.320756  0.297736
1  0.198297  0.277456 -1.129768 -0.579428
1  0.198297  0.277456  0.320756  0.297736
2  1.529188  1.023568 -0.670853 -0.466754
4 -0.393748  0.976632  0.455129  1.230298

The 2 rows with index 1 in df1 are being joined to the 2 rows with index 1 in df2 resulting in 4 rows with index 1 -- probably not what you want.

So, if df1.index does contain duplicate values, use pd.concat to guarantee a simple juxtaposition of the two shapes:

In [7]: pd.concat([df1, df2.T.set_index(df1.index)], axis=1)
Out[7]: 
          a         b         c         d
1 -1.087152 -0.828800 -1.129768 -0.579428
2  1.529188  1.023568 -0.670853 -0.466754
1  0.198297  0.277456  0.320756  0.297736
4 -0.393748  0.976632  0.455129  1.230298

One reason you might want to use df1.join, however, is that if you know df1.index has no duplicate values, then using it is faster than using pd.concat:

In [13]: df1 = pd.DataFrame(np.random.normal(size=8000).reshape(-1,2), columns=['a','b'])

In [14]: df2 = pd.DataFrame(np.random.normal(size=8000).reshape(2,-1),index=['c','d'])

In [15]: %timeit df1.join(df2.T.set_index(df1.index))
1000 loops, best of 3: 600 µs per loop

In [16]: %timeit pd.concat([df1, df2.T.set_index(df1.index)], axis=1)
1000 loops, best of 3: 1.18 ms per loop

Collectives™ on Stack Overflow

Dataframe columns from Dataframe rows in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related