4

My question is similar to testing whether a Numpy array contains a given row but instead I need a non-trivial extension to the method offered in the linked question; the linked question is asking how to check if each row in an array is the same as a single other row. The point of this question is to do that for numerous rows, one does not obviously follow from the other.

Say I have an array:

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

I want to know if each row of this array is in a secondary array given by:

check_array = np.array([[1, 2, 4], [1, 2, 1]])

Ideally this would look something like this:

is_in_check = array in check_array

Where is_in_check looks like this:

is_in_check = np.array([True, False, False, True])

I realise for very small arrays it would be easier to use a list comprehension or something similar, but the process has to be performant with arrays on the order of 106 rows.

I have seen that for checking for a single row the correct method is:

is_in_check_single = any((array[:]==[1, 2, 1]).all(1))

But ideally I'd like to generalise this over multiple rows so that the process is vectorized.

In practice, I would expect to see the following dimensions for each array:

array.shape = (1000000, 3)
check_array.shape = (5, 3)
2
  • Can you provide dimensions you expect to see in practice? e.g. array.shape, check_array.shape. It would also help to know the number of unique values that can appear in the arrays (e.g. 1, 2, 3, 4, 5 -> 5 in this example. Commented May 19, 2021 at 11:27
  • Apologies, I think I made that confusing by using "indexes" instead of "rows" when describing how long it might be. I've fixed that, and given the expected shapes at the bottom. The algorithm is a symmetry finding algorithm based on distance, so I would imagine that there would only be 50-100 unique rows in 1,000,000 row array. Commented May 19, 2021 at 11:38

1 Answer 1

7

Broadcasting is an option:

import numpy as np

array = np.array([[1, 2, 4], [3, 5, 1], [5, 5, 1], [1, 2, 1]])

check_array = np.array([[1, 2, 4], [1, 2, 1]])
is_in_check = (check_array[:, None] == array).all(axis=2).any(axis=0)

Produces:

[ True False False  True]

Broadcasting the other way:

is_in_check = (check_array == array[:, None]).all(axis=2).any(axis=1)

Also Produces

[ True False False  True]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.