1

I have two Pandas DataFrames (A and B) with 2 columns and different number of rows.
They used to be numpy 2D matrices and they both contain integer values.
Is there any way to retrieve the indices of matching rows between those two?

I've been trying isin() or query() or merge(), without success.

This is actually a follow-up to a previous question: I'm trying with pandas dataframes since the original matrices are rather huge.

The desired output, if possible, should be an array (or list) containing in i-th position the row index in B for the i-th row of A. E.g an output list of [1,5,4] means that the first row of A has been found in first row of B, the second row of A has been found in fifth row in B and the third row of A has been found in forth row in B.

2
  • 1
    Can you provide an example with the desired output? Commented Jun 3, 2016 at 10:50
  • IIUC you could do lhs.merge(rhs, how='outer', indicator=True) this will add _merge column which will indicate which rows are in left_only, right_only or both Commented Jun 3, 2016 at 10:50

2 Answers 2

2

i would do it this way:

In [199]: df1.reset_index().merge(df2.reset_index(), on=['a','b'])
Out[199]:
   index_x  a  b  index_y
0        1  9  1       17
1        3  4  0        4

or like this:

In [211]: pd.merge(df1.reset_index(), df2.reset_index(), on=['a','b'], suffixes=['_1','_2'])
Out[211]:
   index_1  a  b  index_2
0        1  9  1       17
1        3  4  0        4

data:

In [201]: df1
Out[201]:
   a  b
0  1  9
1  9  1
2  8  1
3  4  0
4  2  0
5  2  2
6  2  9
7  1  1
8  4  3
9  0  4

In [202]: df2
Out[202]:
    a  b
0   3  5
1   5  0
2   7  8
3   6  8
4   4  0
5   1  5
6   9  0
7   9  4
8   0  9
9   0  1
10  6  9
11  6  7
12  3  3
13  5  1
14  4  2
15  5  0
16  9  5
17  9  1
18  1  6
19  9  5
Sign up to request clarification or add additional context in comments.

3 Comments

inner can be omit, becuse it is default parameter.
and maybe nicer is alternative print (pd.merge(df1.reset_index(), df2.reset_index(), on=['a','b']))
Thank you very much indeed guys. Just changing from inner to right matched also the expected output format. Not your fault, I edited my question later.
-1

Without merging, you can use == and then look if on each row there is False.

df1 = pd.DataFrame({'a':[0,1,2,3,4],'b':[0,1,2,3,4]})
df2 = pd.DataFrame({'a':[0,1,2,3,4],'b':[2,1,2,2,4]})
test = pd.DataFrame(index = df1.index,columns = ['test'])
for row in df1.index:
    if False in (df1 == df2).loc[row].values:
        test.ix[row,'test'] = False
    else:
        test.ix[row,'test'] = True

Out[1]:
    test
0   False
1   True
2   True
3   False
4   True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.