Index of matching rows in Pandas DataFrame [Python]

Question

I have two Pandas DataFrames (A and B) with 2 columns and different number of rows.
They used to be numpy 2D matrices and they both contain integer values.
Is there any way to retrieve the indices of matching rows between those two?

I've been trying isin() or query() or merge(), without success.

This is actually a follow-up to a previous question: I'm trying with pandas dataframes since the original matrices are rather huge.

The desired output, if possible, should be an array (or list) containing in i-th position the row index in B for the i-th row of A. E.g an output list of [1,5,4] means that the first row of A has been found in first row of B, the second row of A has been found in fifth row in B and the third row of A has been found in forth row in B.

IIUC you could do lhs.merge(rhs, how='outer', indicator=True) this will add _merge column which will indicate which rows are in left_only, right_only or both — EdChum
– EdChum, Commented Jun 3, 2016 at 10:50

MaxU - stand with Ukraine · Accepted Answer · 2016-06-03 11:06:02Z

2

i would do it this way:

In [199]: df1.reset_index().merge(df2.reset_index(), on=['a','b'])
Out[199]:
   index_x  a  b  index_y
0        1  9  1       17
1        3  4  0        4

or like this:

In [211]: pd.merge(df1.reset_index(), df2.reset_index(), on=['a','b'], suffixes=['_1','_2'])
Out[211]:
   index_1  a  b  index_2
0        1  9  1       17
1        3  4  0        4

data:

In [201]: df1
Out[201]:
   a  b
0  1  9
1  9  1
2  8  1
3  4  0
4  2  0
5  2  2
6  2  9
7  1  1
8  4  3
9  0  4

In [202]: df2
Out[202]:
    a  b
0   3  5
1   5  0
2   7  8
3   6  8
4   4  0
5   1  5
6   9  0
7   9  4
8   0  9
9   0  1
10  6  9
11  6  7
12  3  3
13  5  1
14  4  2
15  5  0
16  9  5
17  9  1
18  1  6
19  9  5

edited Jun 3, 2016 at 11:06

answered Jun 3, 2016 at 10:56

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jezrael Over a year ago

inner can be omit, becuse it is default parameter.

jezrael Over a year ago

and maybe nicer is alternative print (pd.merge(df1.reset_index(), df2.reset_index(), on=['a','b']))

AlessioX Over a year ago

Thank you very much indeed guys. Just changing from inner to right matched also the expected output format. Not your fault, I edited my question later.

ysearka · Accepted Answer · 2016-06-03 11:01:10Z

-1

Without merging, you can use == and then look if on each row there is False.

df1 = pd.DataFrame({'a':[0,1,2,3,4],'b':[0,1,2,3,4]})
df2 = pd.DataFrame({'a':[0,1,2,3,4],'b':[2,1,2,2,4]})
test = pd.DataFrame(index = df1.index,columns = ['test'])
for row in df1.index:
    if False in (df1 == df2).loc[row].values:
        test.ix[row,'test'] = False
    else:
        test.ix[row,'test'] = True

Out[1]:
    test
0   False
1   True
2   True
3   False
4   True

answered Jun 3, 2016 at 11:01

ysearka

3,8655 gold badges24 silver badges42 bronze badges

Collectives™ on Stack Overflow

Index of matching rows in Pandas DataFrame [Python]

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related