1

I have two Pandas DataFrames. I would like to add the rows of the other dataframe as columns in the other. I've tried reading through the Merge, join, and concatenate - documentation, but can't get my head around how to do this in Pandas.

Here's how I've managed to do it with converting to numpy arrays, but surely there is a smart way to do this in Pandas.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,3,4],columns=['a','b'])
df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])
ar = np.concatenate((df1.values,df2.values.T),axis=1)
df = pd.DataFrame(ar,columns=['a','b','c','d'],index=[1,2,3,4])
0

1 Answer 1

3

If df1.index has no duplicate values, then you could use df1.join:

In [283]: df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,3,4],columns=['a','b'])

In [284]: df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])

In [285]: df1.join(df2.T.set_index(df1.index))
Out[285]: 
          a         b         c         d
1 -1.196281  0.222283  1.247750 -0.121309
2  1.188098  0.384871 -1.324419 -1.610255
3 -0.928642 -0.618491  0.171215 -1.545479
4 -0.832756 -0.491364  0.100428 -0.525689

If df1 has duplicate entries in its index, then df1.join(...) may return more rows than desired. For example, if df1 has non-unique index [1,2,1,4] then:

In [4]: df1 = pd.DataFrame(np.random.normal(size=8).reshape(4,2),index=[1,2,1,4],columns=['a','b'])

In [5]: df2 = pd.DataFrame(np.random.normal(size=8).reshape(2,4),index=['c','d'],columns=[5,6,7,8])

In [8]: df1.join(df2.T.set_index(df1.index))
Out[8]: 
          a         b         c         d
1 -1.087152 -0.828800 -1.129768 -0.579428
1 -1.087152 -0.828800  0.320756  0.297736
1  0.198297  0.277456 -1.129768 -0.579428
1  0.198297  0.277456  0.320756  0.297736
2  1.529188  1.023568 -0.670853 -0.466754
4 -0.393748  0.976632  0.455129  1.230298

The 2 rows with index 1 in df1 are being joined to the 2 rows with index 1 in df2 resulting in 4 rows with index 1 -- probably not what you want.

So, if df1.index does contain duplicate values, use pd.concat to guarantee a simple juxtaposition of the two shapes:

In [7]: pd.concat([df1, df2.T.set_index(df1.index)], axis=1)
Out[7]: 
          a         b         c         d
1 -1.087152 -0.828800 -1.129768 -0.579428
2  1.529188  1.023568 -0.670853 -0.466754
1  0.198297  0.277456  0.320756  0.297736
4 -0.393748  0.976632  0.455129  1.230298

One reason you might want to use df1.join, however, is that if you know df1.index has no duplicate values, then using it is faster than using pd.concat:

In [13]: df1 = pd.DataFrame(np.random.normal(size=8000).reshape(-1,2), columns=['a','b'])

In [14]: df2 = pd.DataFrame(np.random.normal(size=8000).reshape(2,-1),index=['c','d'])

In [15]: %timeit df1.join(df2.T.set_index(df1.index))
1000 loops, best of 3: 600 µs per loop

In [16]: %timeit pd.concat([df1, df2.T.set_index(df1.index)], axis=1)
1000 loops, best of 3: 1.18 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.