How do you Check if each Row of a Numpy Array is Contained in a Secondary Array?

Question

My question is similar to testing whether a Numpy array contains a given row but instead I need a non-trivial extension to the method offered in the linked question; the linked question is asking how to check if each row in an array is the same as a single other row. The point of this question is to do that for numerous rows, one does not obviously follow from the other.

Say I have an array:

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

I want to know if each row of this array is in a secondary array given by:

check_array = np.array([[1, 2, 4], [1, 2, 1]])

Ideally this would look something like this:

is_in_check = array in check_array

Where is_in_check looks like this:

is_in_check = np.array([True, False, False, True])

I realise for very small arrays it would be easier to use a list comprehension or something similar, but the process has to be performant with arrays on the order of 10⁶ rows.

I have seen that for checking for a single row the correct method is:

is_in_check_single = any((array[:]==[1, 2, 1]).all(1))

But ideally I'd like to generalise this over multiple rows so that the process is vectorized.

In practice, I would expect to see the following dimensions for each array:

array.shape = (1000000, 3)
check_array.shape = (5, 3)

Can you provide dimensions you expect to see in practice? e.g. array.shape, check_array.shape. It would also help to know the number of unique values that can appear in the arrays (e.g. 1, 2, 3, 4, 5 -> 5 in this example. — hilberts_drinking_problem
– hilberts_drinking_problem, Commented May 19, 2021 at 11:27
Apologies, I think I made that confusing by using "indexes" instead of "rows" when describing how long it might be. I've fixed that, and given the expected shapes at the bottom. The algorithm is a symmetry finding algorithm based on distance, so I would imagine that there would only be 50-100 unique rows in 1,000,000 row array. — Connor
– Connor, Commented May 19, 2021 at 11:38

Henry Ecker · Accepted Answer · 2021-05-19 11:28:13Z

7

Broadcasting is an option:

import numpy as np

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

check_array = np.array([[1, 2, 4], [1, 2, 1]])

is_in_check = (check_array[:, None] == array).all(axis=2).any(axis=0)

Produces:

[ True False False  True]

Broadcasting the other way:

is_in_check = (check_array == array[:, None]).all(axis=2).any(axis=1)

Also Produces

[ True False False  True]

answered May 19, 2021 at 11:28

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How do you Check if each Row of a Numpy Array is Contained in a Secondary Array?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related