9

I have 2 Dataframe as follows:

DF1=
    A    B   C    D
0   AA   BA  KK   0
1   AD   BD  LL   0
2   AF   BF  MM   0

DF2=
    K    L
0   AA   BA
1   AD   BF
2   AF   BF

At the end what I want to get is:

DF1=
    A    B   C    D
0   AA   BA  KK   1
1   AD   BD  LL   0
2   AF   BF  MM   1

So, I want to compare two dataframe, I want to see which rows of first data frame (for column A and B) are in common of of second dataframe(Column K and L) and assign 1 on the coulmn D of first dataframe.

I can use for loop, but It will be very slow for large number of entries.

Any clue or suggestion will be appreciated.

4 Answers 4

17

This would be easier if you renamed the columns of df2 and then you can compare row-wise:

In [35]:

df2.columns = ['A', 'B']
df2
Out[35]:
    A   B
0  AA  BA
1  AD  BF
2  AF  BF
In [38]:

df1['D'] = (df1[['A', 'B']] == df2).all(axis=1).astype(int)
df1
Out[38]:
    A   B   C  D
0  AA  BA  KK  1
1  AD  BD  LL  0
2  AF  BF  MM  1
Sign up to request clarification or add additional context in comments.

7 Comments

And if you can't rename them, you can create a new DataFrame dynamically whose contents are a view on df2 but whose column names match df1's, and then it's just this code again.
You could just assign the result of rename to another df and compare that too so df3 = df2.rename(columns={'K':'A', 'L':'B'})
Or just use df2.rename(…) directly in the expression instead of storing it in a temporary name df3.
Sure, just pointing out that your answer is still basically the right answer even if he doesn't make it easy for himself (and it's still not that hard to use), so hopefully he'll accept this answer even if he for some reason can't change the definition of df2.
@Alexander this is what abarnert stated in his 2nd comment above, it depends on what the background on all this is
|
5
df1['ColumnName'].isin(df2['ColumnName']).value_counts()

Comments

2

This is how I solved it:

df1 = pd.DataFrame({"A":['AA','AD','AD'], "B":['BA','BD','BF']})
df2 = pd.DataFrame({"A":['AA','AD'], 'B':['BA','BF']})
df1['compressed']=df1.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df2['compressed']=df2.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df1['Success'] = df1['compressed'].isin(df2['compressed']).astype(int)
print df1

    A   B     compressed   Success
0  AA  BA      AABA          1
1  AD  BD      ADBD          0
2  AD  BF      ADBF          1

4 Comments

How is your answer related to the original question?
I have tried with different dataframe, it serves the purpose.
but your desired output is not what the original question was about you should update your question or post a new question
this is a confusing response and doesn't explain what is happening or why. also there is at least one other, better, answer on this page.
2
DF1.merge(right=DF2, left_on=[DF1.A, DF1.B], right_on=[DF2.K, DF2.L], indicator=True, how='left')

gives:

A   B   C  D    K    L     _merge
0  AA  BA  KK  0   AA   BA       both
1  AD  BD  LL  0  NaN  NaN  left_only
2  AF  BF  MM  0   AF   BF       both

So, as above, indicator does the job.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.