5

for example if i have:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])

and i want to check if the following list is the same as one of the lists that the array consist of:

B = [2,3,4]

I tried

B in A #which returns True

But the following also returns True, which should be false:

B = [2,2,2]
B in A

6 Answers 6

6

Try this generator comprehension. The builtin any() short-circuits so that you don't have extra evaluations that you don't need.

any(np.array_equal(row, B) for row in A)

For now, np.array_equal doesn't implement internal short-circuiting. In a different question the performance impact of different ways of accomplishing this is discussed.

As @Dan mentions below, broadcasting is another valid way to solve this problem, and it's often (though not always) a better way. For some rough heuristics, here's how you might want to choose between the two approaches. As with any other micro-optimization, benchmark your results.

Generator Comprehension

  • Reduced memory footprint (not creating the array B==A)
  • Short-circuiting (if the first row of A is B, we don't have to look at the rest)
  • When rows are large (definition depends on your system, but could be ~100 - 100,000), broadcasting isn't noticeably faster.
  • Uses builtin language features. You have numpy installed anyway, but I'm partial to using the core language when there isn't a reason to do otherwise.

Broadcasting

  • Fastest way to solve an extremely broad range of problems using numpy. Using it here is good practice.
  • If we do have to search through every row in A (i.e. if more often than not we expect B to not be in A), broadcasting will almost always be faster (not always a lot faster necessarily, see next point)
  • When rows are smallish, the generator expression won't be able to vectorize the computations efficiently, so broadcasting will be substantially faster (unless of course you have enough rows that short-circuiting outweighs that concern).
  • In a broader context where you have more numpy code, the use of broadcasting here can help to have more consistent patterns in your code base. Coworkers and future you will appreciate not having a mix of coding styles and patterns.
Sign up to request clarification or add additional context in comments.

6 Comments

"it looks like B in A is being interpreted as np.isin(B, A).all()" I don't think so, try [1,3,1] for example. I think it is checking each column and returns true if the number is in that column of any row.
You're right that we don't really need a generator comprehension, but that depends on the anticipated workload. When there are largeish rows the performance difference is negligible other than the comprehension having reduced memory consumption, and the early stopping can allow this to be substantially faster if we would typically expect A to contain B in the sense OP expects.
"Doesn't require a conversion of B to an array " -- turns out the broadcasting is happy with lists as well as arrays. I don't know if it takes a performance hit or not though
You're right. I was getting a weird result with the == where instead of returning an array it was returning a boolean. I attributed it to broadcasting acting up, but it was a shape mismatch.
@Dan Last I checked it isn't short-circuited internally, and there were discussions about whether it was needed or not. Some other SO questions discussed using things like numba to speed speed up checks like this and allow short-circuiting.
|
3

You can do it by using broadcasting like this:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])
B = np.array([2,3,4]) # Or [2,3,4], a list will work fine here too

(B==A).all(axis=1).any()

Comments

1

Using the built-in any. As soon as an identical element is found, it stops iterating and returns true.

import numpy as np

A = np.array([[2,3,4],[5,6,7]])
B = [3,2,4]

if any(np.array_equal(B, x) for x in A):
  print(f'{B} inside {A}')
else:
  print(f'{B} NOT inside {A}')

Comments

0

You need to use .all() for comparing all the elements of list.

A = np.array([[2,3,4],[5,6,7]])
B = [2,3,4]

for i in A:
    if (i==B).all():
        print ("Yes, B is present in A")
        break

EDIT: I put break to break out of the loop as soon as the first occurence is found. This applies to example such as A = np.array([[2,3,4],[2,3,4]])

# print ("Yes, B is present in A")

Alternative solution using any:

any((i==B).all() for i in A)

# True

3 Comments

Suppose A was np.array([[2,3,4],[2,3,4]]). Then this will print Yes, B is present in A twice. Is there any way to make it print only once?
@Kevin: You can put a break as soon as it is present
@Kevin wrap in an np.any(). Also there is no need for a loop here, just use broadcasting
0
list((A[[i], :]==B).all() for i in range(A.shape[0])) 

[True, False]

This will tell you what row of A is equal to B

Comments

0

Straight forward, you could use any() to go through a generator comparing the arrays with array_equal.

from numpy import array_equal
import numpy as np

A = np.array([[2,3,4],[5,6,7]])
B = np.array([2,2,4]) 

in_A = lambda x, A : any((array_equal(a,x) for a in A))

print(in_A(B, A))
False

[Program finished] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.