How to remove duplicate elements for two numpy arrays

Question

I have two arrays named u, v, e.g.

u = np.array([1.0,2.0,2.0,3.0,4.0])
v = np.array([10.0,21.0,18.0,30.0,40.0])
a = np.array([100.0,210.0,220.0,300.0,400.0])

If two elements in u are same, then delete that one which is higher in v value. For the above example, the result should be

u_new = np.array([1.0,2.0,3.0,4.0])
v_new = np.array([10.0,18.0,30.0,40.0])
a_new = np.array([100.0,220.0,300.0,400.0])

def remove_duplicates(u,v,a):
    u_new, indices = np.unique(u, return_index=True)
    v_new = np.zeros(len(u_new), dtype=np.float64)
    a_new = np.zeros(len(u_new), dtype=np.float64)
    for i in range(len(indices)):
        j1 = indices[i]
        if i < len(indices) - 1:
            j2 = indices[i + 1]
        else:
            j2 = j1 + 1
        v_new[i] = np.amin(v[j1:j2])
        k = np.argmin(v[j1:j2]) + j1
        a_new[i] = a[k]

    return u_new, v_new, a_new

The above code has a problem when treat floating number because there is not exact equality between two floating number. So I have to change it to a very 'stupid' way

def remove_duplicates(u, v, a):
    u_new = u
    v_new = v
    a_new = a
    cnt = 0
    for i in range(len(u)):
        if cnt <1:
            u_new[cnt] = u[i]
            v_new[cnt] = v[i]
            a_new[cnt] = a[i]
            cnt += 1
        else:
            if abs(u[i]-u_new[cnt-1]) > 1e-5:
                u_new[cnt] = u[i]
                v_new[cnt] = v[i]
                a_new[cnt] = a[i]
                cnt += 1
            else:
                print("Two points with same x coord found.ignore", i)
                if v_new[cnt-1] > v[i]:
                    v_new[cnt-1] = v[i]
                    a_new[cnt-1] = a[i]

    return u_new[:cnt], v_new[:cnt], a_new[:cnt]

How can I program it in a Pythonic way?

Constructing a new array, by looping over the first two arrays seems most feasible to me. I think no in-place operation is preferable. — Rockybilly
– Rockybilly, Commented Dec 6, 2016 at 2:08
Thank you for your comment. I want more python-like code to do this as the loop on array is time-consuming I think. — Jilong Yin
– Jilong Yin, Commented Dec 6, 2016 at 4:59

mkrieger1 · Accepted Answer · 2025-09-05 20:35:00Z

1

This should work with a threshhold value to clean up your floats:

def remove_duplicates(u, v, a, d=1e-5):
    s = np.argsort(u)
    ud = abs(u[s][1:] - u[s][:-1]) < d
    vd = v[s][1:] < v[s][:-1]
    drop = np.union1d(s[:-1][ud & vd], s[1:][ud & ~vd])
    return np.delete(u, drop), np.delete(v, drop), np.delete(a, drop)

edited Sep 5 at 20:35

mkrieger1

24.2k7 gold badges68 silver badges84 bronze badges

answered Dec 6, 2016 at 7:41

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akuiper · Accepted Answer · 2016-12-06 02:41:28Z

1

You can use zip, sorted and groupby functions:

from itertools import groupby
u1, v1 = zip(*[next(g) for k, g in groupby(sorted(zip(u, v)), key = lambda x: x[0])])
# note here use next to take the first element(smaller v value) from each group    

u1
# (1.0, 2.0, 3.0, 4.0)

v1
# (10.2, 22.0, 28.0, 41.0)

edited Dec 6, 2016 at 2:41

answered Dec 6, 2016 at 2:32

akuiper

216k33 gold badges362 silver badges379 bronze badges

4 Comments

Jilong Yin Over a year ago

very beautiful solution. Thank you a lot.

Jilong Yin Over a year ago

When the numpy array elements are float numbers, the method above will have problem because the comparison between two float numbers is not exact without tolerance given. How can I give the tolerance and make it works? Thank you.

akuiper Over a year ago

I think you can round the key to the precision you wanted. u1, v1 = zip(*[next(g) for k, g in groupby(sorted(zip(u, v)), key = lambda x: round(x[0], 10))]) for instance.

Jilong Yin Over a year ago

Rouding the float to some precision is an alternative method for this. Thank you.

Divakar · Accepted Answer · 2016-12-06 07:44:01Z

1

Approach #1 : Here's an approach for floating-pt numbers by slitting into groups of tolerable (by given tolerance value) proximity -

tol = 1e-5 # Set tolerance for floating pt number match
A = np.split( v, np.flatnonzero(np.diff(u) > tol)+1)
lens = np.array(list(map(len,A)))
idx = np.array([np.argmax(i) for i in A]) 
idx[1:] += lens[:-1].cumsum()
m = ~np.in1d(np.arange(a.size), idx[lens>1])
u_new, v_new, a_new = u[m], v[m], a[m]

Sample input, output -

In [143]: u=np.array([1.0,2.0,2.00000001,3.0,3.9999998, 4.0, 4.00000001])
     ...: v=np.array([10.0,21.0,18.0,30.0,36.0, 40.0, 38.0])
     ...: a=np.array([100.0,210.0,220.0,300.0,77.0, 400.0, 67.00])
     ...: 

In [144]: u_new
Out[144]: array([ 1.        ,  2.00000001,  3.        ,  3.9999998 ,  4.00000001])

In [145]: v_new
Out[145]: array([ 10.,  18.,  30.,  36.,  38.])

In [146]: a_new
Out[146]: array([ 100.,  220.,  300.,   77.,   67.])

Approach #2 : Here's another approach without splitting and as such must be more efficient -

u_idx = np.append(False, np.diff(u) > tol).cumsum()
max_idx = (np.append(np.unique(u_idx, return_index=1)[1], u_idx.size)-1)[1:]
sidx = (v.max()*u_idx + v).argsort()
m = ~np.in1d(np.arange(a.size), sidx[max_idx][np.bincount(u_idx)>1])
u_new, v_new, a_new = u[m], v[m], a[m]

edited Dec 6, 2016 at 7:44

answered Dec 6, 2016 at 7:05

Divakar

222k19 gold badges273 silver badges374 bronze badges

1 Comment

Jilong Yin Over a year ago

Thank you for your two approaches. I believe both of them will work well though I am not sure I can understand them thoroughly.

Collectives™ on Stack Overflow

How to remove duplicate elements for two numpy arrays

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related