1

I need not just the values, but the locations of elements in one numpy array that also appear in a second numpy array, and I need the locations in that second array too.

Here's an example of the best I've been able to do:

>>> a=np.arange(0.,15.)
>>> a
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.])
>>> b=np.arange(4.,8.,.5)
>>> b
array([ 4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,  7.5])
>>> [ (i,j) for (i,alem) in enumerate(a) for (j,blem) in enumerate(b) if alem==blem]
[(4, 0), (5, 2), (6, 4), (7, 6)]

Anybody have anything faster, numpy specific, or more "pythonic"?

0

3 Answers 3

3

Here is an O((n+k)log(n+k)) (the naive algorithm is O(nk)) solution with np.unique

uniq, inv = np.unique(np.r_[a, b], return_inverse=True)
map = -np.ones((len(uniq),), dtype=int)
map[inv[:len(a)]] = np.arange(len(a))
bina = map[inv[len(a):]]
inds_in_b = np.where(bina != -1)[0]
elements, inds_in_a = b[inds_in_b], bina[inds_in_b]

or you could simply sort a for O((n+k)log(k))

inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
elements, inds_in_a = b[inds_in_b], inds[bina[inds_in_b]]
Sign up to request clarification or add additional context in comments.

Comments

3

For sorted array a, here's another approach with np.searchsorted making use of its optional argument - side set as left and right -

lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
out = lidx[mask], np.flatnonzero(mask)
       # for zipped o/p : zip(lidx[mask], np.flatnonzero(mask))

Runtime test

Approaches -

def searchsorted_where(a,b):  # @Paul Panzer's soln
    inds = np.argsort(a)
    aso = a[inds]
    bina = np.searchsorted(aso[:-1], b)
    inds_in_b = np.where(b == aso[bina])[0]
    return b[inds_in_b], inds_in_b

def in1d_masking(a,b):  # @Psidom's soln
    logic = np.in1d(b, a)    
    return b[logic], np.where(logic)[0]

def searchsorted_twice(a,b): # Proposed in this post
    lidx = np.searchsorted(a,b,'left')
    ridx = np.searchsorted(a,b,'right')
    mask = lidx != ridx
    return lidx[mask], np.flatnonzero(mask)

Timings -

Case #1 (Using sample data from question and scaling it up) :

In [2]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,0.5)
   ...: 

In [3]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
1000 loops, best of 3: 721 µs per loop
1000 loops, best of 3: 1.76 ms per loop
1000 loops, best of 3: 1.28 ms per loop

Case #2 (Same as case #1 with no. of elems in b comparatively lesser than in a) :

In [4]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,5)
   ...: 

In [5]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 77.4 µs per loop
1000 loops, best of 3: 428 µs per loop
10000 loops, best of 3: 128 µs per loop

Case #3 (and comparatively much lesser elems in b) :

In [6]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,10)
   ...: 

In [7]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 42.8 µs per loop
1000 loops, best of 3: 392 µs per loop
10000 loops, best of 3: 71.9 µs per loop

7 Comments

What happens if you cut the first two lines from mine? They are there in case a is not sorted, so under your test conditions they should go I think ;-)
@divakar Dang! Didn't mean to assign homework! :) Thanks very much for a very helpful and thoughtful answer. Thanks to everyone for your answers, BTW.
Divakar I checked myself. Unsurprisingly, in the last condition my function spends 3/4 of its time argsorting arange(15000.). In the middle condition it is still almost 2/3. So, I must protest your methodology in the strongest possible terms ;-)
@PaulPanzer Sorry, didn't look into the details, my apologies, updated! :)
@bob.sacamento Updated the timings and seems like Paul's solution is the fastest one in all conditions. So, you might want to reconsider that accept thing :)
|
1

You can use numpy.in1d to find out the elements of b also in a, logical indexing and numpy.where can get the elements and index correspondingly:

logic = np.in1d(b, a)    
list(zip(b[logic], np.where(logic)[0]))
# [(4.0, 0), (5.0, 2), (6.0, 4), (7.0, 6)]

b[logic], np.where(logic)[0]
# (array([ 4.,  5.,  6.,  7.]), array([0, 2, 4, 6]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.