NumPy check if 2D array is subset of 2D array [duplicate]

Question

I would like to check if the array b is a subset of the array a. By subset I mean I would like to check if all the elements of b are found in a.

Here is the code I have:

import numpy as np
a = np.array([[1,7,9],[8,3,12],[101,-74,0.5]])
b = np.array([[1,9],[8,12],[101,0.5]])
print a
print b

Here is the output

Array a

[[   1.     7.     9. ]
 [   8.     3.    12. ]
 [ 101.   -74.     0.5]]

Array b

[[   1.     9. ]
 [   8.    12. ]
 [ 101.     0.5]]

Is there a way to check if b is a subset of a?

EDIT: Additional Information:

As per comments below, I should clarify that I need to know if array b is a subset of array a - if even one element is missing from the subset, then I am looking for a way to check for this. I do not need to have an indication of where in the subset the element is missing but just to know it is missing. If additional information can be provided about the missing element then that will be a bonus but it is not a hard requirement. Apologies for not clearing this up earlier.

My reasoning in phrasing the question as a subset is that if one array is a subset of the other array then this would imply to me that all the values of the subset array are present in the larger array.

I think you need to elaborate on " I would like to check if all the elements of b are found in a" as we are dealing with 2D arrays here . Think of the various situations that might negate your definition of "subset", think of the other situations that must follow. All elements along the respective rows from a and b? Along the same column only in b? — Divakar
– Divakar, Commented May 16, 2016 at 19:29
Sorry I should have explained this. Check if all elements along respective columns of b are subsets of those in a. This is what I am after. — edesz
– edesz, Commented May 16, 2016 at 19:54
So the desired output in this case would be a bool array with three values of true, right? One for each row, which indeed have columns which are subsets. — Eelco Hoogendoorn
– Eelco Hoogendoorn, Commented May 16, 2016 at 20:55
How are you defining subset here? Are you looking for th existance of a pair of boolean masks such that (a[m1,m2] == b).all(), ie some subset of the rows and columns — Eric
– Eric, Commented May 17, 2016 at 1:39

Bi Rico · Accepted Answer · 2016-05-16 19:44:00Z

5

I think you want numpy.in1d, something like this:

import numpy as np
a = np.array([[1,7,9],[8,3,12],[101,-74,0.5]])
b = np.array([[1,9],[8,12],[101,0.5]])

np.in1d(b.ravel(), a.ravel()).all()

answered May 16, 2016 at 19:44

Bi Rico

25.9k3 gold badges57 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

edesz Over a year ago

Thanks a lot. Although I accepted the earlier answer, this works as well. Simple one-liner.

B. M. · Accepted Answer · 2016-05-16 21:41:38Z

3

If you want to compare columns, a way is to group them first :

a = np.array([[1,7,9],[8,3,12],[101,-74,0.5]])
b = np.array([[1,9],[8,12],[101,0.5]])
c = np.array([[1,9],[8,12],[101,-74.]])

def bycols(arr):
    tr=arr.T.copy()
    type=np.dtype((np.void,tr.strides[0]))
    return tr.view(type).squeeze()

A,B,C=[bycols(x) for x in (a,b,c)]

Then A,B,C are just arrays of bytes representing columns:

In [5]: [x.shape for x in (A,B,C)]
Out[5]: [(3,), (2,), (2,)]

You can now test belonging with np.in1d :

In [6]: np.in1d(C,A)
Out[6]: array([ True, False], dtype=bool)

In [7]: np.in1d(B,A)
Out[7]: array([ True,  True], dtype=bool)

But :

In [8]: np.in1d(c,a)
Out[8]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

since np1d apply on flattened arrays.

edited May 16, 2016 at 21:41

answered May 16, 2016 at 21:29

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

Comments

Tonechas · Accepted Answer · 2016-05-16 22:18:49Z

2

This should work:

set(np.unique(b)).issubset(set(np.unique(a)))

EDIT: The code above returns True or False rather than a column vector of booleans. From @Eelco Hoogendoorn's comment to your question, I understand that you are actually interested in checking whether a row of b is a subset of the corresponding row of a, right? Assuming that this is the correct problem description, the following one-liner should work:

np.array([[set(bi).issubset(set(ai))] for ai, bi in zip(map(tuple, a), map(tuple, b))])

The code above is simple, readable, and does not require third party dependencies. It is admittedly a quick and dirty solution, since as @Bi Rico correctly pointed out, such an approach can be pretty inefficient. If you need to handle large arrays you should stick to a vectorized algorithm.

edited May 16, 2016 at 22:18

answered May 16, 2016 at 19:35

Tonechas

13.8k16 gold badges52 silver badges85 bronze badges

4 Comments

edesz Over a year ago

Thanks. This works and it answers my question.

Bi Rico Over a year ago

It's worth noting that for larger arrays, switching between numpy arrays and built in types like sets and lists can be pretty expensive in processing time and memory usage.

Eelco Hoogendoorn Over a year ago

If I read your question correctly, this does not answer it; it regards all elements in the array at once, and does not act per-column.

edesz Over a year ago

Thank you for the feedback. Either option will work for me. I am just looking to check for the presence of False - this would indicate that one value (any value) is different. If this is the case, then I know that one array has a problem with it. If it is possible to have a comparison between rows, as you have done in your edit, then that is also useful but not necessary. Your initial answer works just fine because it identifies False or True and this is exactly that I was looking for. Thanks for the added solution.

Eelco Hoogendoorn · Accepted Answer · 2016-05-17 05:35:48Z

1

If I read your question correctly (test for each corresponding row in a and b, if the row in b is a subset of the row in a), this should do it efficiently and correctly:

import numpy_indexed as npi
rowsa = np.indices(a.shape)[0]
rowsb = np.indices(b.shape)[0]
# test for each value-rowidx pair in b if it is contained in a
c = npi.contains((a.flatten(), rowsa.flatten()), (b.flatten(), rowsb.flatten()))
# check that all elements on a row are contained
row_is_subset = c.reshape(b.shape).all(axis=1)

You need to install the numpy_indexed package (disclaimer: I am its author)

edited May 17, 2016 at 5:35

answered May 16, 2016 at 21:08

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

NumPy check if 2D array is subset of 2D array [duplicate]

4 Answers 4

1 Comment

Comments

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

4 Comments

Comments

Linked

Related