1

I want to create a 'mask' index array for an array, based on whether the elements of that array are members of some set. What I want can be achieved as follows:

x = np.arange(20)
interesting_numbers = {1, 5, 7, 17, 18}
x_mask = np.array([xi in interesting_numbers for xi in x])

I'm wondering if there's a faster way to execute that last line. As it is, it builds a list in Python by repeatedly calling a __contains__ method, then converts that list to a numpy array.

I want something like x_mask = x[x in interesting_numbers] but that's not valid syntax.

4
  • Would x always be a range array? Commented Apr 19, 2017 at 13:54
  • No, not always. Commented Apr 19, 2017 at 14:03
  • Would x always be sorted? Commented Apr 19, 2017 at 14:05
  • No, x is an arbitrary array. I am looking for a practical solution rather than algorithmic Commented Apr 19, 2017 at 14:07

2 Answers 2

3

You can use np.in1d:

np.in1d(x, list(interesting_numbers))
#array([False,  True, False, False, False,  True, False,  True, False,
#       False, False, False, False, False, False, False, False,  True,
#        True, False], dtype=bool)

Timing, it is faster if the array x is large:

x = np.arange(10000)
interesting_numbers = {1, 5, 7, 17, 18}

%timeit np.in1d(x, list(interesting_numbers))
# 10000 loops, best of 3: 41.1 µs per loop

%timeit x_mask = np.array([xi in interesting_numbers for xi in x])
# 1000 loops, best of 3: 1.44 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I'll try this. If there are no other better answers, I'll mark this one as accepted
1

Here's one approach with np.searchsorted -

def set_membership(x, interesting_numbers):
    b = np.sort(list(interesting_numbers))
    idx = np.searchsorted(b, x)
    idx[idx==b.size] = 0
    return b[idx] == x

Runtime test -

# Setup inputs with random numbers that are not necessarily sorted
In [353]: x = np.random.choice(100000, 10000, replace=0)

In [354]: interesting_numbers = set(np.random.choice(100000, 1000, replace=0))

In [355]: x_mask = np.array([xi in interesting_numbers for xi in x])

# Verify output with set_membership
In [356]: np.allclose(x_mask, set_membership(x, interesting_numbers))
Out[356]: True

# @Psidom's solution
In [357]: %timeit np.in1d(x, list(interesting_numbers))
1000 loops, best of 3: 1.04 ms per loop

In [358]: %timeit set_membership(x, interesting_numbers)
1000 loops, best of 3: 682 µs per loop

2 Comments

I'm surprised it is only so little better than the other solution.
@PaulPanzer Yeah with searchsorted, it seems that its sorting and then looking for indices. That sorting is killing it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.