Comparing 2 columns of two Python Pandas dataframes and getting the common rows

Question

I have 2 Dataframe as follows:

DF1=
    A    B   C    D
0   AA   BA  KK   0
1   AD   BD  LL   0
2   AF   BF  MM   0

DF2=
    K    L
0   AA   BA
1   AD   BF
2   AF   BF

At the end what I want to get is:

DF1=
    A    B   C    D
0   AA   BA  KK   1
1   AD   BD  LL   0
2   AF   BF  MM   1

So, I want to compare two dataframe, I want to see which rows of first data frame (for column A and B) are in common of of second dataframe(Column K and L) and assign 1 on the coulmn D of first dataframe.

I can use for loop, but It will be very slow for large number of entries.

Any clue or suggestion will be appreciated.

Alexander · Accepted Answer · 2015-05-17 20:55:35Z

17

This would be easier if you renamed the columns of df2 and then you can compare row-wise:

In [35]:

df2.columns = ['A', 'B']
df2
Out[35]:
    A   B
0  AA  BA
1  AD  BF
2  AF  BF
In [38]:

df1['D'] = (df1[['A', 'B']] == df2).all(axis=1).astype(int)
df1
Out[38]:
    A   B   C  D
0  AA  BA  KK  1
1  AD  BD  LL  0
2  AF  BF  MM  1

edited May 17, 2015 at 20:55

Alexander

111k32 gold badges212 silver badges208 bronze badges

answered May 17, 2015 at 19:19

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

abarnert Over a year ago

And if you can't rename them, you can create a new DataFrame dynamically whose contents are a view on df2 but whose column names match df1's, and then it's just this code again.

EdChum Over a year ago

You could just assign the result of rename to another df and compare that too so df3 = df2.rename(columns={'K':'A', 'L':'B'})

abarnert Over a year ago

Or just use df2.rename(…) directly in the expression instead of storing it in a temporary name df3.

abarnert Over a year ago

Sure, just pointing out that your answer is still basically the right answer even if he doesn't make it easy for himself (and it's still not that hard to use), so hopefully he'll accept this answer even if he for some reason can't change the definition of df2.

EdChum Over a year ago

@Alexander this is what abarnert stated in his 2nd comment above, it depends on what the background on all this is

|

kalehmann · Accepted Answer · 2019-08-22 20:41:06Z

5

df1['ColumnName'].isin(df2['ColumnName']).value_counts()

edited Aug 22, 2019 at 20:41

kalehmann

5,0896 gold badges29 silver badges37 bronze badges

answered Aug 22, 2019 at 20:10

Vipul Saxena

711 silver badge1 bronze badge

Comments

Mohammad Saifullah · Accepted Answer · 2015-05-19 02:20:36Z

2

This is how I solved it:

df1 = pd.DataFrame({"A":['AA','AD','AD'], "B":['BA','BD','BF']})
df2 = pd.DataFrame({"A":['AA','AD'], 'B':['BA','BF']})
df1['compressed']=df1.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df2['compressed']=df2.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df1['Success'] = df1['compressed'].isin(df2['compressed']).astype(int)
print df1

    A   B     compressed   Success
0  AA  BA      AABA          1
1  AD  BD      ADBD          0
2  AD  BF      ADBF          1

answered May 19, 2015 at 2:20

Mohammad Saifullah

1,1435 gold badges19 silver badges35 bronze badges

4 Comments

EdChum Over a year ago

How is your answer related to the original question?

Mohammad Saifullah Over a year ago

I have tried with different dataframe, it serves the purpose.

EdChum Over a year ago

but your desired output is not what the original question was about you should update your question or post a new question

FistOfFury Over a year ago

this is a confusing response and doesn't explain what is happening or why. also there is at least one other, better, answer on this page.

Tomerikoo · Accepted Answer · 2021-12-09 08:46:51Z

2

DF1.merge(right=DF2, left_on=[DF1.A, DF1.B], right_on=[DF2.K, DF2.L], indicator=True, how='left')

gives:

A   B   C  D    K    L     _merge
0  AA  BA  KK  0   AA   BA       both
1  AD  BD  LL  0  NaN  NaN  left_only
2  AF  BF  MM  0   AF   BF       both

So, as above, indicator does the job.

edited Dec 9, 2021 at 8:46

Tomerikoo

19.6k16 gold badges57 silver badges68 bronze badges

answered Mar 13, 2018 at 8:05

PiotrKu

214 bronze badges

Collectives™ on Stack Overflow

Comparing 2 columns of two Python Pandas dataframes and getting the common rows

4 Answers 4

7 Comments

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related