8

I use Python with numpy.

I have a numpy array may_a:

may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])

I have a numpy array may_b:

may_b = numpy.array([False,True,True,False])

I need to find array may_b in array may_a.

In the output I need to get indexes of occurrences.

out_index=[2,7]

Can someone please suggest, how do I get out_index?

4
  • Did you mean out_index=[2,6] ? Commented Feb 15, 2013 at 7:48
  • 1
    @Konfle Dolex, out_index=[2,7] Commented Feb 15, 2013 at 7:50
  • @Olga Ah. I misread your question. Commented Feb 15, 2013 at 7:51
  • @Robin, no, it is search of the optimum decision Commented Feb 15, 2013 at 7:52

5 Answers 5

5

EDIT The following code does allow to perform a convolution based check of equality. It maps True to 1 and False to -1. It also reverses b, which is needed for it to work properly:

def search(a, b) :
    return np.where(np.round(fftconvolve(a * 2 - 1, (b * 2 - 1)[::-1],
                                         mode='valid') - len(b)) == 0)[0]

I have checked that it gives the same output as the as_strided method for a variety of random inputs, which it does. I have also timed both approached, and convolution only starts paying off with largish search tokens of around 256 items.


It seems like a little overkill, but with boolean data you can use (abuse?) convolution:

In [8]: np.where(np.convolve(may_a, may_b.astype(int),
   ...:                      mode='valid') == may_b.sum())[0]
Out[8]: array([2, 7])

For larger datasets it may be faster to go with scipy.signal.fftconvolve:

In [13]: np.where(scipy.signal.fftconvolve(may_a, may_b,
   ....:                                   mode='valid') == may_b.sum())[0]
Out[13]: array([2, 7])

You have to be careful though, because the output now is floating point, and rounding may spoil the equality check:

In [14]: scipy.signal.fftconvolve(may_a, may_b, mode='valid')
Out[14]: array([ 1.,  1.,  2.,  1.,  1.,  1.,  1.,  2.])

So you may be better off with something along the lines of:

In [15]: np.where(np.round(scipy.signal.fftconvolve(may_a, may_b, mode='valid') -
   ....:                   may_b.sum()) == 0)[0]
Out[15]: array([2, 7])
Sign up to request clarification or add additional context in comments.

5 Comments

With This convolution you'll match anything that is [*, True, True, *] where * is a wildcard.
@BiRico Oops, you are absolutely right! There may be a chance to salvage the method, by mapping the True's and the False's to some integer value, possibly +1 and -1.
@Jaime >>> may_a = np.array([True,True,True,True]) >>> out_ind = np.where(np.convolve(may_a, may_b.astype(int),mode='valid') == may_b.sum())[0] >>> out_ind -> array([0]) it is incorrect(
@Olga Yes, that's what BiRico was saying. But the method in my edit at the top of the answer works fine: out_ind = np.where(np.convolve(may_a * 2 - 1, (may_b * 2 - 1)[::-1], mode='valid') == len(may_b))
@Jaime What if array is 2-dimension? may_a= np.array([[0,1,2],[2,3,1],[3,4,5],[3,3,3]]); may_b = np.array([[3,3,3],[2,3,1]]); this method failed
5

A much cooler approach, which may not perform a well, but which works for any dtype, is to use as_strided:

In [2]: from numpy.lib.stride_tricks import as_strided

In [3]: may_a = numpy.array([False, True, False, True, True, False,
   ...:                      True, False, True, True, False])

In [4]: may_b = numpy.array([False,True,True,False])

In [5]: a = len(may_a)

In [6]: b = len(may_b)

In [7]: a_view = as_strided(may_a, shape=(a - b + 1, b),
   ...:                     strides=(may_a.dtype.itemsize,) * 2)

In [8]: a_view
Out[8]: 
array([[False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True, False,  True],
       [ True, False,  True, False],
       [False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False]], dtype=bool)

In [9]: numpy.where(numpy.all(a_view == may_b, axis=1))[0]
Out[9]: array([2, 7])

You have to be careful though, because even though a_view is a view of may_a's data, when comparing it with may_b a temporary array of (a - b + 1) * b is created, which may be a problem with large as and bs.

1 Comment

Maybe you appreciate pointing out small things... Not using .itemsize but .strides[0] is a bit less prone to failure, in case the array was sliced previously.
3

This looks very similar to a string search problem. If you want to avoid implementing one these string search algorithms, you could abuse pythons built in string search, which is very fast, by doing something like:

# I've added [True, True, True] at the end.
may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False, True, True, True])
may_b = numpy.array([False,True,True,False])

may_a_str = may_a.tostring()
may_b_str = may_b.tostring()

idx = may_a_str.find(may_b_str)
out_index = []
while idx >= 0:
    out_index.append(idx)
    idx = may_a_str.find(may_b_str, idx+1)

This should work fine for boolean arrays. If you want to use this approach for another array type, you'll need to make sure the strides of the two arrays match and divide out_index by that stride.

You could also use the regular expression module instead of the loop to do the string search.

Comments

2

This should also work with other that boolean data:

In [1]: import numpy as np

In [2]: a = np.array([False, True, False, True, True, False, True, False, True, True, False])

In [3]: b = np.array([False,True,True,False])

In [4]: def get_indices(a, b):
   ...:     window = len(b)
   ...:     shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   ...:     strides = a.strides + (a.strides[-1],)
   ...:     w = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
   ...:     return np.where(np.all(np.equal(w,b),1) == True)[0]

In [5]: get_indices(a,b)
Out[5]: array([2, 7])

2 Comments

I changed an array a. >>> a=np.array([False,False]) >>> b=np.array([False,True,True,False]) >>> get_indices(a,b) >>> Out: ValueError: negative dimensions are not allowed
@Olga -- Yes, shape will be (-1, 4), you can add if len(a) < len(b): return np.array([]) to prevent that as in that case b cannot possibly be a subarray of a.
1

I am not sure whether numpy provide a function for that. If it does not, here is a solution:

import numpy

def searchListIndexs(array, target):
    ret = []
    iLimit = len(array)-len(target)+1
    jLimit = len(target)
    for i in range(iLimit):
        for j in range(jLimit):
            if array[i+j] != target[j]:
                break
        else:
            ret.append(i)
    return ret


may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])
may_b = numpy.array([False,True,True,False])
out_index = searchListIndexs(may_a, may_b)
print out_index #If you are using Python 3, then use print(out_index) instead.

2 Comments

Yep. :( This is a limitation of this approach.
BTW, I guess that these is no faster algorithm than this. I guess it is necessary to iterate through the entire array because no sorting is possible in this case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.