python location of elements in one numpy array with location of equal elements in another array

Question

I need not just the values, but the locations of elements in one numpy array that also appear in a second numpy array, and I need the locations in that second array too.

Here's an example of the best I've been able to do:

>>> a=np.arange(0.,15.)
>>> a
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.])
>>> b=np.arange(4.,8.,.5)
>>> b
array([ 4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,  7.5])
>>> [ (i,j) for (i,alem) in enumerate(a) for (j,blem) in enumerate(b) if alem==blem]
[(4, 0), (5, 2), (6, 4), (7, 6)]

Anybody have anything faster, numpy specific, or more "pythonic"?

Paul Panzer · Accepted Answer · 2017-02-24 16:29:41Z

3

Here is an O((n+k)log(n+k)) (the naive algorithm is O(nk)) solution with np.unique

uniq, inv = np.unique(np.r_[a, b], return_inverse=True)
map = -np.ones((len(uniq),), dtype=int)
map[inv[:len(a)]] = np.arange(len(a))
bina = map[inv[len(a):]]
inds_in_b = np.where(bina != -1)[0]
elements, inds_in_a = b[inds_in_b], bina[inds_in_b]

or you could simply sort a for O((n+k)log(k))

inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
elements, inds_in_a = b[inds_in_b], inds[bina[inds_in_b]]

edited Feb 24, 2017 at 16:29

answered Feb 24, 2017 at 15:49

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Divakar · Accepted Answer · 2017-02-25 11:24:12Z

3

For sorted array a, here's another approach with np.searchsorted making use of its optional argument - side set as left and right -

lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
out = lidx[mask], np.flatnonzero(mask)
       # for zipped o/p : zip(lidx[mask], np.flatnonzero(mask))

Runtime test

Approaches -

def searchsorted_where(a,b):  # @Paul Panzer's soln
    inds = np.argsort(a)
    aso = a[inds]
    bina = np.searchsorted(aso[:-1], b)
    inds_in_b = np.where(b == aso[bina])[0]
    return b[inds_in_b], inds_in_b

def in1d_masking(a,b):  # @Psidom's soln
    logic = np.in1d(b, a)    
    return b[logic], np.where(logic)[0]

def searchsorted_twice(a,b): # Proposed in this post
    lidx = np.searchsorted(a,b,'left')
    ridx = np.searchsorted(a,b,'right')
    mask = lidx != ridx
    return lidx[mask], np.flatnonzero(mask)

Timings -

Case #1 (Using sample data from question and scaling it up) :

In [2]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,0.5)
   ...: 

In [3]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
1000 loops, best of 3: 721 µs per loop
1000 loops, best of 3: 1.76 ms per loop
1000 loops, best of 3: 1.28 ms per loop

Case #2 (Same as case #1 with no. of elems in b comparatively lesser than in a) :

In [4]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,5)
   ...: 

In [5]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 77.4 µs per loop
1000 loops, best of 3: 428 µs per loop
10000 loops, best of 3: 128 µs per loop

Case #3 (and comparatively much lesser elems in b) :

In [6]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,10)
   ...: 

In [7]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 42.8 µs per loop
1000 loops, best of 3: 392 µs per loop
10000 loops, best of 3: 71.9 µs per loop

edited Feb 25, 2017 at 11:24

answered Feb 24, 2017 at 17:25

Divakar

222k19 gold badges273 silver badges374 bronze badges

7 Comments

Paul Panzer Over a year ago

What happens if you cut the first two lines from mine? They are there in case a is not sorted, so under your test conditions they should go I think ;-)

bob.sacamento Over a year ago

@divakar Dang! Didn't mean to assign homework! :) Thanks very much for a very helpful and thoughtful answer. Thanks to everyone for your answers, BTW.

Paul Panzer Over a year ago

Divakar I checked myself. Unsurprisingly, in the last condition my function spends 3/4 of its time argsorting arange(15000.). In the middle condition it is still almost 2/3. So, I must protest your methodology in the strongest possible terms ;-)

Divakar Over a year ago

@PaulPanzer Sorry, didn't look into the details, my apologies, updated! :)

Divakar Over a year ago

@bob.sacamento Updated the timings and seems like Paul's solution is the fastest one in all conditions. So, you might want to reconsider that accept thing :)

|

akuiper · Accepted Answer · 2017-02-24 15:53:35Z

1

You can use numpy.in1d to find out the elements of b also in a, logical indexing and numpy.where can get the elements and index correspondingly:

logic = np.in1d(b, a)    
list(zip(b[logic], np.where(logic)[0]))
# [(4.0, 0), (5.0, 2), (6.0, 4), (7.0, 6)]

b[logic], np.where(logic)[0]
# (array([ 4.,  5.,  6.,  7.]), array([0, 2, 4, 6]))

edited Feb 24, 2017 at 15:53

answered Feb 24, 2017 at 15:31

akuiper

216k33 gold badges362 silver badges379 bronze badges

Collectives™ on Stack Overflow

python location of elements in one numpy array with location of equal elements in another array

3 Answers 3

Comments

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related