deleting rows of a numpy array based on uniqueness of a value

Question

let's say I have a bi-dimensional array like that

numpy.array(
    [[0,1,1.2,3],
    [1,5,3.2,4],
    [3,4,2.8,4], 
    [2,6,2.3,5]])

I want to have an array formed eliminating whole rows based on uniqueness of values of last column, selecting the row to keep based on value of third column. e.g. in this case i would like to keep only one of the rows with 4 as last column, and choose the one which has the minor value of third column, having something like that as a result:

array([0,1,1.2,3],
      [3,4,2.8,4],
      [2,6,2.3,5])

thus eliminating row [1,5,3.2,4]

which would be the best way to do it?

llimllib · Accepted Answer · 2009-01-23 00:55:24Z

1

My numpy is way out of practice, but this should work:

#keepers is a dictionary of type int: (int, int)
#the key is the row's final value, and the tuple is (row index, row[2])
keepers = {}
deletions = []
for i, row in enumerate(n):
    key = row[3]
    if key not in keepers:
        keepers[key] = (i, row[2])
    else:
        if row[2] > keepers[key][1]:
            deletions.append(i)
        else:
            deletions.append(keepers[key][0])
            keepers[key] = (i, row[2])
o = numpy.delete(n, deletions, axis=0)

I've greatly simplified it from my declarative solution, which was getting quite unwieldy. Hopefully this is easier to follow; all we do is maintain a dictionary of values that we want to keep and a list of indexes we want to delete.

edited Jan 23, 2009 at 0:55

answered Jan 22, 2009 at 19:41

llimllib

3,7321 gold badge31 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jfs Over a year ago

Add at the end your version with itertools.groupby(). It is interesting.

llimllib Over a year ago

I'll be a bit more precise: it's wrong in an algorithmic way. In order to work, I was going to need to sort the array, which is something I really want to avoid in order to keep the runtime down to O(n), which this solution should be

divenex · Accepted Answer · 2014-12-10 14:55:05Z

1

This can be achieved efficiently in Numpy by combining lexsort and unique as follows

import numpy as np

a = np.array([[0, 1, 1.2, 3], 
              [1, 5, 3.2, 4],
              [3, 4, 2.8, 4], 
              [2, 6, 2.3, 5]])

# Sort by last column and 3rd column when values are equal
j = np.lexsort(a.T)

# Find first occurrence (=smallest 3rd column) of unique values in last column
k = np.unique(a[j, -1], return_index=True)[1]

print(a[j[k]])

This returns the desired result

[[ 0.   1.   1.2  3. ]
 [ 3.   4.   2.8  4. ]
 [ 2.   6.   2.3  5. ]]

edited Dec 10, 2014 at 14:55

answered Dec 10, 2014 at 14:16

divenex

17.7k11 gold badges67 silver badges59 bronze badges

Collectives™ on Stack Overflow

deleting rows of a numpy array based on uniqueness of a value

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related