Let's say I have a numpy array of the form
x = np.array([[2, 5],
[3, 4],
[1, 3],
[2, 5],
[4, 5],
[1, 3],
[1, 4],
[3, 4]])
What I would like to get from this is an array which contains only the rows which are NOT duplicates, i.e., I expect from this example
array([[4, 5],
[1, 4]])
I'm looking for a method which is reasonably fast and scales well. The only way that I can think to do this is
- First find the set of unique rows in
x, as a new arrayy. - Create a new array
zwhich has those individual elements ofyremoved fromx, thuszis a list of the duplicated rows inx. - Do a set difference between
xandz.
This seems horribly inefficient though. Anyone have a better way?
If it is important, I'm guaranteed that each of my rows will be sorted smallest to largest so that you'll never have a row be [5, 2] or [3, 1].