27

This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.

So, if

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])

Currently I am using

C = np.searchsorted(A,B)

which takes advantage of the fact that A is in order, and gives me [1, 3, 5], the indices of the elements that are in A. This is great, but how do I get D = [0,2,4,6], the indices of elements of A that are not in B?

5 Answers 5

44

searchsorted may give you wrong answer if not every element of B is in A. You can use numpy.in1d:

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6,8])
mask = np.in1d(A, B)
print np.where(mask)[0]
print np.where(~mask)[0]

output is:

[1 3 5]
[0 2 4 6]

However in1d() uses sort, which is slow for large datasets. You can use pandas if your dataset is large:

import pandas as pd
np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Here is the time comparison:

A = np.random.randint(0, 1000, 10000)
B = np.random.randint(0, 1000, 10000)

%timeit np.where(np.in1d(A, B))[0]
%timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

output:

100 loops, best of 3: 2.09 ms per loop
1000 loops, best of 3: 594 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

It's good to know about this efficient method because my datasets are very large. Thanks so much for this solution!
8
import numpy as np

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])
C = np.searchsorted(A, B)

D = np.delete(np.arange(np.alen(A)), C)

D
#array([0, 2, 4, 6])

2 Comments

Thanks! I also like the answer provided by alexhb using np.setdiff1d. I was hoping that there was a function that would give me the indices directly, but this works just fine.
There might be, @Dan, but I can't think of it. If you don't need C, use his solution, but mine will be twice as fast if you've already got C.
7
import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([2, 4, 6])
c = np.searchsorted(a, b)
d = np.searchsorted(a, np.setdiff1d(a, b))

d
#array([0, 2, 4, 6])

2 Comments

Having to search twice slows this down a bit, better to use the already known C to get D. But, this is of course the better solution if C is not needed, so +1. (Welcome to Stack Overflow!)
should the c line be deleted? it is not doing anything here
6

The elements of A that are also in B:

set(A) & set(B)

The elements of A that are not in B:

set(A) - set(B)

2 Comments

This does not answer the question (to get indexes, not elements). However, if you want to perform above operation for numpy, do not convert it to set, but use numpy operations instead. See intersect1d and setdiff1d (or eventually setxor1d).
Thank you, as I was looking for elements not indices and the question title is ambiguous. I appreciate the numpy operations as well.
0
all_vals = np.arange(1000)  # `A` in the question
seen_vals = np.unique(np.random.randint(0, 1000, 100))  # `B` in the question
# indices of unseen values
mask = np.isin(all_vals, seen_vals, invert=True)  # `D` in the original question
unseen_vals = all_vals[mask]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.