3

I want to find frequency of elements of a given one dimensional numpy array (arr1) in another one dimensional numpy array (arr2). The array arr1 contains elements with no repetitions. Also, all elements in arr1 are part of the array of unique elements of arr2

Consider this as an example,

arr1 = np.array([1,2,6])
arr2 = np.array([2, 3, 6, 1, 2, 1, 2, 0, 2, 0])

At present, I am using the following:

freq = np.zeros(  len(arr1)  )

for i in range( len(arr1) ):
    mark = np.where( arr2==arr1[i] )
    freq[i] = len(mark[0])

print freq
>>[2, 4, 1]

The aforementioned method gives me the correct answer. But, I want to know if there is a better/more efficient method than the one that I am following.

1 Answer 1

4

Here's a vectorized solution based on np.searchsorted -

idx = np.searchsorted(arr1,arr2)
idx[idx==len(arr1)] = 0
mask = arr1[idx]==arr2
out = np.bincount(idx[mask])

It assumes arr1 is sorted. If not so, we got two solutions :

  1. Sort arr1 as the pre-processing step. Since, arr1 is part of unique elements from arr2, this should be a comparatively smaller array and hence an inexpensive sorting operation.

  2. Use sorter arg with searchsorted to compute idx:

    sidx = arr1.argsort(); idx = sidx[np.searchsorted(arr1,arr2,sorter=sidx)]

Sign up to request clarification or add additional context in comments.

5 Comments

Where arr1 is assumed to be sorted.
Thanks. :) arr1 is assumed to be sorted.
I get an error ( IndexError: index 3 is out of bounds for axis 1 with size 3) when I change to arr1 = np.array([1,2,3]) from arr1 = np.array([1,2,6]). I wondering if I am missing something
@SiddharthSatpathy Needed an edit there. Should be fixed now.
Thanks, Divakar. Your help is much appreciated. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.