I have an 3d array with shape (1000, 12, 30), and I have a list of 2d array's of shape (12, 30), what I want to do is check if these 2d arrays exist in the 3d array. Is there a simple way in Python to do this? I tried keyword in but it doesn't work.
-
The solution here should apply to your problem stackoverflow.com/questions/7100242/…. Marking duplicateXero Smith– Xero Smith2018-05-03 03:02:19 +00:00Commented May 3, 2018 at 3:02
-
Those questions are not the same.Teodorico Levoff– Teodorico Levoff2018-05-03 03:03:37 +00:00Commented May 3, 2018 at 3:03
-
The solutions apply to this case. Adjust the rolling window accordinglyXero Smith– Xero Smith2018-05-03 03:05:04 +00:00Commented May 3, 2018 at 3:05
-
It doesn't apply. Questions that are a duplicate should be marked as a duplicate, those are not the same questions. I don't understand why you would mark this as a duplicate.Teodorico Levoff– Teodorico Levoff2018-05-03 03:06:24 +00:00Commented May 3, 2018 at 3:06
-
I have retracted the flag though.Xero Smith– Xero Smith2018-05-03 03:06:31 +00:00Commented May 3, 2018 at 3:06
3 Answers
There is a way in numpy , you can do with np.all
a = np.random.rand(3, 1, 2)
b = a[1][0]
np.all(np.all(a == b, 1), 1)
Out[612]: array([False, True, False])
Solution from bnaecker
np.all(a == b, axis=(1, 2))
If only want to check exit or not
np.any(np.all(a == b, axis=(1, 2)))
7 Comments
np.all(a == b, axis=(1,2)).b in this case will be broadcast (replicated) along the first dimension to match a. Then the axis arguments to np.all reduce that along the last two dimensions, leaving a boolean array of shape (30,) with True at indices i where a[i] == b.np.allclose() rather than np.all() if you're dealing with floating point numbers.Here is a fast method (previously used by @DanielF as well as @jaime and others, no doubt) that uses a trick to benefit from short-circuiting: view-cast template-sized blocks to single elements of dtype void. When comparing two such blocks numpy stops after the first difference, yielding a huge speed advantage.
>>> def in_(data, template):
... dv = data.reshape(data.shape[0], -1).view(f'V{data.dtype.itemsize*np.prod(data.shape[1:])}').ravel()
... tv = template.ravel().view(f'V{template.dtype.itemsize*template.size}').reshape(())
... return (dv==tv).any()
Example:
>>> a = np.random.randint(0, 100, (1000, 12, 30))
>>> check = a[np.random.randint(0, 1000, (10,))]
>>> check += np.random.random(check.shape) < 0.001
>>>
>>> [in_(a, c) for c in check]
[True, True, True, False, False, True, True, True, True, False]
# compare to other method
>>> (a==check[:, None]).all((-1,-2)).any(-1)
array([ True, True, True, False, False, True, True, True, True,
False])
Gives same result as "direct" numpy approach, but is almost 20x faster:
>>> from timeit import timeit
>>> kwds = dict(globals=globals(), number=100)
>>>
>>> timeit("(a==check[:, None]).all((-1,-2)).any(-1)", **kwds)
0.4793281531892717
>>> timeit("[in_(a, c) for c in check]", **kwds)
0.026218891143798828
4 Comments
vview code. Once you have the void view couldn't you just use np.in1d though?in1d or rather the new isin and it is 10x slower. Not sure what's going on here.Numpy
Given
a = np.arange(12).reshape(3, 2, 2)
lst = [
np.arange(4).reshape(2, 2),
np.arange(4, 8).reshape(2, 2)
]
print(a, *lst, sep='\n{}\n'.format('-' * 20))
[[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
[[ 8 9]
[10 11]]]
--------------------
[[0 1]
[2 3]]
--------------------
[[4 5]
[6 7]]
Notice that lst is a list of arrays as per OP. I'll make that a 3d array b below.
Use broadcasting. Using the broadcasting rules. I want the dimensions of a as (1, 3, 2, 2) and b as (2, 1, 2, 2).
b = np.array(lst)
x, *y = b.shape
c = np.equal(
a.reshape(1, *a.shape),
np.array(lst).reshape(x, 1, *y)
)
I'll use all to produce a (2, 3) array of truth values and np.where to find out which among the a and b sub-arrays are actually equal.
i, j = np.where(c.all((-2, -1)))
This is just a verification that we achieved what we were after. We are supposed to observe that for each paired i and j values, the sub-arrays are actually the same.
for t in zip(i, j):
print(a[t[0]], b[t[1]], sep='\n\n')
print('------')
[[0 1]
[2 3]]
[[0 1]
[2 3]]
------
[[4 5]
[6 7]]
[[4 5]
[6 7]]
------
in
However, to complete OP's thought on using in
a_ = a.tolist()
list(filter(lambda x: x.tolist() in a_, lst))
[array([[0, 1],
[2, 3]]), array([[4, 5],
[6, 7]])]