3

I have a 3D numpy array like this:

>>> a
array([[[0, 1, 2],
        [0, 1, 2],
        [6, 7, 8]],
       [[6, 7, 8],
        [0, 1, 2],
        [6, 7, 8]],
       [[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

I want to remove only those rows which contain duplicates within themselves. For instance the output should look like this:

>>> remove_row_duplicates(a)
array([[[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

This is the function that I am using:

delindices = np.empty(0, dtype=int)

for i in range(len(a)):
    _, indices = np.unique(np.around(a[i], decimals=10), axis=0, return_index=True)

    if len(indices) < len(a[i]):

        delindices = np.append(delindices, i) 

a = np.delete(a, delindices, 0)

This works perfectly, but the problem is now my array shape is like (1000000,7,3). The for loop is pretty slow in python and this take a lot of time. Also my original array contains floating numbers. Any one who has a better solution or who can help me vectorizing this function?

1
  • it is Done. Thanks for suggestion Commented Jul 14, 2018 at 9:07

2 Answers 2

2

Sort it along the rows for each 2D block i.e. along axis=1 and then look for matching rows along the successive ones and finally look for any matches along the same axis=1 -

b = np.sort(a,axis=1)
out = a[~((b[:,1:] == b[:,:-1]).all(-1)).any(1)]

Sample run with explanation

Input array :

In [51]: a
Out[51]: 
array([[[0, 1, 2],
        [0, 1, 2],
        [6, 7, 8]],

       [[6, 7, 8],
        [0, 1, 2],
        [6, 7, 8]],

       [[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

Code steps :

# Sort along axis=1, i.e rows in each 2D block
In [52]: b = np.sort(a,axis=1)

In [53]: b
Out[53]: 
array([[[0, 1, 2],
        [0, 1, 2],
        [6, 7, 8]],

       [[0, 1, 2],
        [6, 7, 8],
        [6, 7, 8]],

       [[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

In [54]: (b[:,1:] == b[:,:-1]).all(-1) # Look for successive matching rows
Out[54]: 
array([[ True, False],
       [False,  True],
       [False, False]])

# Look for matches along each row, which indicates presence
# of duplicate rows within each 2D block in original 2D array
In [55]: ((b[:,1:] == b[:,:-1]).all(-1)).any(1)
Out[55]: array([ True,  True, False])

# Invert those as we need to remove those cases
# Finally index with boolean indexing and get the output
In [57]: a[~((b[:,1:] == b[:,:-1]).all(-1)).any(1)]
Out[57]: 
array([[[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])
Sign up to request clarification or add additional context in comments.

4 Comments

Got it . Thanks
There is a bit problem with your algorithm. It will not work if the two similar rows in one 2D block are not next to each other. My case is a bit more general. They may or may not be next to each other.
@Malik We are arranging them next to each other with the sort step at the very beginning. Does that clarify your doubt(s)? Look at : b = np.sort(a,axis=1).
aha, yup I see it now.
1

You can probably do this easily using broadcasting but since you're dealing with more than 2D arrays it wont be as optimized as you expect and even in some cases very slow. Instead you can use following approach inspired by Jaime's answer:

In [28]: u = np.unique(arr.view(np.dtype((np.void, arr.dtype.itemsize*arr.shape[1])))).view(arr.dtype).reshape(-1, arr.shape[1])

In [29]: inds = np.where((arr == u).all(2).sum(0) == u.shape[1])

In [30]: arr[inds]
Out[30]: 
array([[[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

5 Comments

What if a[1] = a[0]; a[2] = a[0]?
@Divakar I think I'll depend on whether those equal axis are contain unique rows or not. If they do this code will return both of them and here we need to know what's the OP's expected output.
OP 's working code would return empty array as also stated in the question that we need to remove duplicates.
@Divakar In that case the u will be the answer.
Why it would be u? It's just unique rows globally and not specific to each 2D block, which is what OP wants.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.