Finding indices of non-unique elements in Numpy array

Question

I have found other methods, such as this, to remove duplicate elements from an array. My requirement is slightly different. If I start with:

array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

I would like to end up with:

array([[2, 3, 4],
       [3, 2, 1]
       [3, 4, 5]])

That's what I would ultimately like to end up with, but there is an extra requirement. I would also like to store either an array of indices to discard, or to keep (a la numpy.take).

I am using Numpy 1.8.1

You can count how many time each row appears using methods suggested, for example, here and here. I think that's what your problem here reduces to. — Alex Riley
– Alex Riley, Commented Dec 6, 2015 at 21:51
@ajcr I can't use return_counts so #1 is out for me. Unfortunately #2 seems to require sorted array, and I need to preserve the order. — codedog
– codedog, Commented Dec 6, 2015 at 22:35
@codedog Were either of the answers helpful? If not, could you let us know what else you're looking for, — ilyas patanam
– ilyas patanam, Commented Dec 8, 2015 at 6:10

Community · Accepted Answer · 2017-05-23 12:24:09Z

1

We want to find rows which are not duplicated in your array, while preserving the order.

I use this solution to combine each row of a into a single element, so that we can find the unique rows using np.unique(,return_index=True, return_inverse= True). Then, I modified this function to output the counts of the unique rows using the index and inverse. From there, I can select all unique rows which have counts == 1.

a = np.array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

#use a flexible data type, np.void, to combine the columns of `a`
#size of np.void is the number of bytes for an element in `a` multiplied by number of columns
b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, inv = np.unique(b, return_index = True, return_inverse = True)

def return_counts(index, inv):
    count = np.zeros(len(index), np.int)
    np.add.at(count, inv, 1)
    return count

counts = return_counts(index, inv)

#if you want the indices to discard replace with: counts[i] > 1
index_keep = [i for i, j in enumerate(index) if counts[i] == 1]

>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

#if you don't need the indices and just want the array returned while preserving the order
a_unique = np.vstack(a[idx] for i, idx in enumerate(index) if counts[i] == 1])
>>>a_unique
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

For np.version >= 1.9

b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, counts = np.unique(b, return_index = True, return_counts = True)

index_keep = [i for i, j in enumerate(index) if counts[i] == 1]
>>>a[index_keep]
array([[2, 3, 4],
   [3, 2, 1],
   [3, 4, 5]])

edited May 23, 2017 at 12:24

CommunityBot

11 silver badge

answered Dec 7, 2015 at 2:02

ilyas patanam

5,3722 gold badges31 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1121588 Over a year ago

except the request was to exclude [1,2,3] since it occurs more than once

ilyas patanam Over a year ago

@Dan Patterson Thanks for pointing it out, I have edited my solution.

codedog Over a year ago

Yesterday I found out we have implemented this in a C extension. I have not tested this solution explicitly but it looks very similar to what has been implemented here. That's why I accepted it as a solution. Thanks.

s.ouchène · Accepted Answer · 2019-11-06 22:56:29Z

1

You can proceed as follows:

# Assuming your array is a
uniq, uniq_idx, counts = np.unique(a, axis=0, return_index=True, return_counts=True)

# to return the array you want
new_arr = uniq[counts == 1]

# The indices of non-unique rows
a_idx = np.arange(a.shape[0]) # the indices of array a
nuniq_idx = a_idx[np.in1d(a_idx, uniq_idx[counts==1], invert=True)]

You get:

#new_arr
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

# nuniq_idx
array([0, 2])

edited Nov 6, 2019 at 22:56

answered Nov 6, 2019 at 22:49

s.ouchène

1,9011 gold badge15 silver badges34 bronze badges

Comments

Johan E. T. · Accepted Answer · 2015-12-07 09:19:20Z

If you want to delete all instances of the elements, that exists in duplicate versions, you could iterate through the array, find the indexes of elements existing in more than one version, and lastly delete these:

# The array to check:
array = numpy.array([[1, 2, 3],
        [2, 3, 4],
        [1, 2, 3],
        [3, 2, 1],
        [3, 4, 5]])

# List that contains the indices of duplicates (which should be deleted)
deleteIndices = []

for i in range(0,len(array)): # Loop through entire array
    indices = range(0,len(array)) # All indices in array
    del indices[i] # All indices in array, except the i'th element currently being checked

for j in indexes: # Loop through every other element in array, except the i'th element, currently being checked
    if(array[i] == array[j]).all(): # Check if element being checked is equal to the j'th element
        deleteIndices.append(j) # If i'th and j'th element are equal, j is appended to deleteIndices[]

# Sort deleteIndices in ascending order:
deleteIndices.sort()

# Delete duplicates
array = numpy.delete(array,deleteIndices,axis=0)

This outputs:

>>> array
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

>>> deleteIndices
[0, 2]

Like that you both delete the duplicates and get a list of indices to discard.

Eelco Hoogendoorn · Accepted Answer · 2016-04-02 19:36:57Z

0

The numpy_indexed package (disclaimer: I am its author) can be used to solve such problems in a vectorized manner:

index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

answered Apr 2, 2016 at 19:36

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Finding indices of non-unique elements in Numpy array

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related