0

I have this 2D numpy array here:

arr = np.array([[1,2],
                [2,2],
                [3,2],
                [4,2],
                [5,3]])

I would like to delete all duplicates corresponding to the previous index at index 1 and get an output like so:

np.array([[1,2],
          [5,3]])

However, when I try my code it errors. Here is my code:

for x in range(0, len(arr)):
    if arr[x][1] == arr[x-1][1]:
        arr = np.delete(arr, x, 0)

>>> IndexError: index 3 is out of bounds for axis 0 with size 2

1 Answer 1

1

Rather than trying to delete from the array, you can use np.unique to find the indices of first occurrences of the unique values in the second columns and use that to pull those values out:

import numpy as np   

arr = np.array([[1,2],
                [2,2],
                [3,2],
                [4,2],
                [5,3]])

u, i = np.unique(arr[:,1], return_index=True)

arr[i]    
# array([[1, 2],
#       [5, 3]])
Sign up to request clarification or add additional context in comments.

2 Comments

Note that this doesn't follow the "delete all duplicates corresponding to the previous index" rule, if there were another group of 1s after the 2s it would be deleted completely (which my be wanted, or not...)
That's a fair point @mozway, I certainly didn't read it that way, but on re-reading, it's a reasonable interpretation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.