0

I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.

array = [[1,2,3,4],
         [2,9,12,4],
         [5,6,7,8],
         [6,8,12,13]]

I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.

The resulting array should end up being:

array = [[1,2,3,4],
         [5,6,7,8]]

I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.

Any thoughts are appreaciated.

1
  • Typically we recommend splitting this into two tasks. First identify which elements need to be removed, and then do the removal. SInce the removal (np.delete) actually returns a new array, it may be better to think in terms of which rows you want to keep. Commented Oct 30, 2020 at 0:51

2 Answers 2

1

You can work with array of sorted numbers and corresponding indices of rows that looks like so:

number_info = array([[ 0,  1],
                     [ 0,  2],
                     [ 1,  2],
                     [ 0,  3],
                     [ 0,  4],
                     [ 1,  4],
                     [ 2,  5],
                     [ 2,  6],
                     [ 3,  6],
                     [ 2,  7],
                     [ 2,  8],
                     [ 3,  8],
                     [ 1,  9],
                     [ 1, 12],
                     [ 3, 12],
                     [ 3, 13]])

It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:

flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) & 
                      (np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)

Output:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I think I can use some derivative of this to accomplish my exact task as my actual array was quite a bit different and this is exact implementation isn't working completely for me.
0

Here's a quick way to do it with a list comprehension and set intersections:

>>> array = [[1,2,3,4],
...          [2,9,12,4],
...          [5,6,7,8],
...          [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.