5

I have an 3d array with shape (1000, 12, 30), and I have a list of 2d array's of shape (12, 30), what I want to do is check if these 2d arrays exist in the 3d array. Is there a simple way in Python to do this? I tried keyword in but it doesn't work.

5
  • The solution here should apply to your problem stackoverflow.com/questions/7100242/…. Marking duplicate Commented May 3, 2018 at 3:02
  • Those questions are not the same. Commented May 3, 2018 at 3:03
  • The solutions apply to this case. Adjust the rolling window accordingly Commented May 3, 2018 at 3:05
  • It doesn't apply. Questions that are a duplicate should be marked as a duplicate, those are not the same questions. I don't understand why you would mark this as a duplicate. Commented May 3, 2018 at 3:06
  • I have retracted the flag though. Commented May 3, 2018 at 3:06

3 Answers 3

5

There is a way in numpy , you can do with np.all

a = np.random.rand(3, 1, 2)
b = a[1][0]
np.all(np.all(a == b, 1), 1)
Out[612]: array([False,  True, False])

Solution from bnaecker

np.all(a == b, axis=(1, 2))

If only want to check exit or not

np.any(np.all(a == b, axis=(1, 2)))
Sign up to request clarification or add additional context in comments.

7 Comments

Or better yet, np.all(a == b, axis=(1,2)).
@Wen I see! thanks for this. I'm still not sure how this would work if the depth is 30 like what I mentioned in the question?
@TeodoricoLevoff Check out NumPy's broadcasting rules. b in this case will be broadcast (replicated) along the first dimension to match a. Then the axis arguments to np.all reduce that along the last two dimensions, leaving a boolean array of shape (30,) with True at indices i where a[i] == b.
@TeodoricoLevoff Also note, that you might need to use np.allclose() rather than np.all() if you're dealing with floating point numbers.
@bnaecker I understand. But I want to return True only if the complete (12, 30) array exist in the (1000, 12, 30). I think the solution mentioned above checks each single value in the 30 lists and outputs a boolean for each?
|
3

Here is a fast method (previously used by @DanielF as well as @jaime and others, no doubt) that uses a trick to benefit from short-circuiting: view-cast template-sized blocks to single elements of dtype void. When comparing two such blocks numpy stops after the first difference, yielding a huge speed advantage.

>>> def in_(data, template):
...     dv = data.reshape(data.shape[0], -1).view(f'V{data.dtype.itemsize*np.prod(data.shape[1:])}').ravel()
...     tv = template.ravel().view(f'V{template.dtype.itemsize*template.size}').reshape(())
...     return (dv==tv).any()

Example:

>>> a = np.random.randint(0, 100, (1000, 12, 30))
>>> check = a[np.random.randint(0, 1000, (10,))]
>>> check += np.random.random(check.shape) < 0.001    
>>>
>>> [in_(a, c) for c in check]
[True, True, True, False, False, True, True, True, True, False]
# compare to other method
>>> (a==check[:, None]).all((-1,-2)).any(-1)
array([ True,  True,  True, False, False,  True,  True,  True,  True,
       False])

Gives same result as "direct" numpy approach, but is almost 20x faster:

>>> from timeit import timeit
>>> kwds = dict(globals=globals(), number=100)
>>> 
>>> timeit("(a==check[:, None]).all((-1,-2)).any(-1)", **kwds)
0.4793281531892717
>>> timeit("[in_(a, c) for c in check]", **kwds)
0.026218891143798828

4 Comments

I was hoping someone would who was better at actual coding would eventually improve my old vview code. Once you have the void view couldn't you just use np.in1d though?
@DanielF You are right, that should be even faster. Could you give me a pointer to your post so I can properly credit you?
@DanielF Strange, I tried with in1d or rather the new isin and it is 10x slower. Not sure what's going on here.
I've given answers with it a few times: here and here most recently. But the original idea came from @jaime here
2

Numpy

Given

a = np.arange(12).reshape(3, 2, 2)
lst = [
    np.arange(4).reshape(2, 2),
    np.arange(4, 8).reshape(2, 2)
]

print(a, *lst, sep='\n{}\n'.format('-' * 20))

[[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]]
--------------------
[[0 1]
 [2 3]]
--------------------
[[4 5]
 [6 7]]

Notice that lst is a list of arrays as per OP. I'll make that a 3d array b below.

Use broadcasting. Using the broadcasting rules. I want the dimensions of a as (1, 3, 2, 2) and b as (2, 1, 2, 2).

b = np.array(lst)
x, *y = b.shape
c = np.equal(
    a.reshape(1, *a.shape),
    np.array(lst).reshape(x, 1, *y)
)

I'll use all to produce a (2, 3) array of truth values and np.where to find out which among the a and b sub-arrays are actually equal.

i, j = np.where(c.all((-2, -1)))

This is just a verification that we achieved what we were after. We are supposed to observe that for each paired i and j values, the sub-arrays are actually the same.

for t in zip(i, j):
    print(a[t[0]], b[t[1]], sep='\n\n')
    print('------')

[[0 1]
 [2 3]]

[[0 1]
 [2 3]]
------
[[4 5]
 [6 7]]

[[4 5]
 [6 7]]
------

in

However, to complete OP's thought on using in

a_ = a.tolist()
list(filter(lambda x: x.tolist() in a_, lst))

[array([[0, 1],
        [2, 3]]), array([[4, 5],
        [6, 7]])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.