Remove entire sub array from multi-dimensional array if any element in array is duplicate

Question

I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.

array = [[1,2,3,4],
         [2,9,12,4],
         [5,6,7,8],
         [6,8,12,13]]

I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.

The resulting array should end up being:

array = [[1,2,3,4],
         [5,6,7,8]]

I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.

Any thoughts are appreaciated.

Typically we recommend splitting this into two tasks. First identify which elements need to be removed, and then do the removal. SInce the removal (np.delete) actually returns a new array, it may be better to think in terms of which rows you want to keep. — hpaulj
– hpaulj, Commented Oct 30, 2020 at 0:51

mathfux · Accepted Answer · 2020-10-30 01:23:34Z

1

You can work with array of sorted numbers and corresponding indices of rows that looks like so:

number_info = array([[ 0,  1],
                     [ 0,  2],
                     [ 1,  2],
                     [ 0,  3],
                     [ 0,  4],
                     [ 1,  4],
                     [ 2,  5],
                     [ 2,  6],
                     [ 3,  6],
                     [ 2,  7],
                     [ 2,  8],
                     [ 3,  8],
                     [ 1,  9],
                     [ 1, 12],
                     [ 3, 12],
                     [ 3, 13]])

It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:

flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) & 
                      (np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)

Output:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

answered Oct 30, 2020 at 1:23

mathfux

5,9792 gold badges20 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Leo Over a year ago

Thanks! I think I can use some derivative of this to accomplish my exact task as my actual array was quite a bit different and this is exact implementation isn't working completely for me.

Samwise · Accepted Answer · 2020-10-29 23:55:53Z

0

Here's a quick way to do it with a list comprehension and set intersections:

>>> array = [[1,2,3,4],
...          [2,9,12,4],
...          [5,6,7,8],
...          [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]

answered Oct 29, 2020 at 23:55

Samwise

72.1k3 gold badges36 silver badges52 bronze badges

Collectives™ on Stack Overflow

Remove entire sub array from multi-dimensional array if any element in array is duplicate

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related