1

I have two large 1d numpy arrays in the range of 400K elements. I need to check for each element in array A if it exists in array B. I used in1d but it seems to be too slow and takes a lot of time. I would like to know if there is any way to speed this up?

A = np.array([1,2,3,4,5,6,7]) 
B = np.array([3,4,7])
result = np.in1d(A, B, invert=True)
result
>> array([ True,  True, False, False,  True,  True, False]
4
  • Please post your code, and a minimum running example. Commented Dec 17, 2018 at 6:05
  • Perhaps, a sample I/O? Commented Dec 17, 2018 at 6:13
  • @Dinari I have updated it Commented Dec 17, 2018 at 6:19
  • @user5173426 I have updated it Commented Dec 17, 2018 at 6:20

2 Answers 2

3

Try transforming B into a structure better fitted for search (hash set or sorted set)

Sign up to request clarification or add additional context in comments.

Comments

1

I prefer pandas for that task:

import pandas as pd

A, B = pd.DataFrame(A), pd.DataFrame(B)
A.merge(B, on=0, how="left", indicator=True)

>>> 0   _merge
0   1   left_only
1   2   left_only
2   3   both
3   4   both
4   5   left_only
5   6   left_only
6   7   both

1 Comment

Unique solution. Thanks a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.