Return the indexes of a sub-array in an array

Question

I use Python with numpy.

I have a numpy array may_a:

may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])

I have a numpy array may_b:

may_b = numpy.array([False,True,True,False])

I need to find array may_b in array may_a.

In the output I need to get indexes of occurrences.

out_index=[2,7]

Can someone please suggest, how do I get out_index?

Did you mean out_index=[2,6] ?

Konfle Dolex
– Konfle Dolex

2013-02-15 07:48:41 +00:00
Commented Feb 15, 2013 at 7:48 — Konfle Dolex
– Konfle Dolex, Commented Feb 15, 2013 at 7:48
@Konfle Dolex, out_index=[2,7]

Olga
– Olga

2013-02-15 07:50:35 +00:00
Commented Feb 15, 2013 at 7:50 — Olga
– Olga, Commented Feb 15, 2013 at 7:50
@Olga Ah. I misread your question.

Konfle Dolex
– Konfle Dolex

2013-02-15 07:51:27 +00:00
Commented Feb 15, 2013 at 7:51 — Konfle Dolex
– Konfle Dolex, Commented Feb 15, 2013 at 7:51
@Robin, no, it is search of the optimum decision

Olga
– Olga

2013-02-15 07:52:00 +00:00
Commented Feb 15, 2013 at 7:52 — Olga
– Olga, Commented Feb 15, 2013 at 7:52

Jaime · Accepted Answer · 2013-02-15 18:57:49Z

5

EDIT The following code does allow to perform a convolution based check of equality. It maps True to 1 and False to -1. It also reverses b, which is needed for it to work properly:

def search(a, b) :
    return np.where(np.round(fftconvolve(a * 2 - 1, (b * 2 - 1)[::-1],
                                         mode='valid') - len(b)) == 0)[0]

I have checked that it gives the same output as the as_strided method for a variety of random inputs, which it does. I have also timed both approached, and convolution only starts paying off with largish search tokens of around 256 items.

It seems like a little overkill, but with boolean data you can use (abuse?) convolution:

In [8]: np.where(np.convolve(may_a, may_b.astype(int),
   ...:                      mode='valid') == may_b.sum())[0]
Out[8]: array([2, 7])

For larger datasets it may be faster to go with scipy.signal.fftconvolve:

In [13]: np.where(scipy.signal.fftconvolve(may_a, may_b,
   ....:                                   mode='valid') == may_b.sum())[0]
Out[13]: array([2, 7])

You have to be careful though, because the output now is floating point, and rounding may spoil the equality check:

In [14]: scipy.signal.fftconvolve(may_a, may_b, mode='valid')
Out[14]: array([ 1.,  1.,  2.,  1.,  1.,  1.,  1.,  2.])

So you may be better off with something along the lines of:

In [15]: np.where(np.round(scipy.signal.fftconvolve(may_a, may_b, mode='valid') -
   ....:                   may_b.sum()) == 0)[0]
Out[15]: array([2, 7])

edited Feb 15, 2013 at 18:57

answered Feb 15, 2013 at 8:16

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Bi Rico Over a year ago

With This convolution you'll match anything that is [*, True, True, *] where * is a wildcard.

Jaime Over a year ago

@BiRico Oops, you are absolutely right! There may be a chance to salvage the method, by mapping the True's and the False's to some integer value, possibly +1 and -1.

Olga Over a year ago

@Jaime

>>> may_a = np.array([True,True,True,True]) >>> out_ind = np.where(np.convolve(may_a, may_b.astype(int),mode='valid') == may_b.sum())[0] >>> out_ind              -> array([0])

it is incorrect(

Jaime Over a year ago

@Olga Yes, that's what BiRico was saying. But the method in my edit at the top of the answer works fine: out_ind = np.where(np.convolve(may_a * 2 - 1, (may_b * 2 - 1)[::-1], mode='valid') == len(may_b))

machen Over a year ago

@Jaime What if array is 2-dimension? may_a= np.array([[0,1,2],[2,3,1],[3,4,5],[3,3,3]]); may_b = np.array([[3,3,3],[2,3,1]]); this method failed

Jaime · Accepted Answer · 2013-02-15 08:53:10Z

5

A much cooler approach, which may not perform a well, but which works for any dtype, is to use as_strided:

In [2]: from numpy.lib.stride_tricks import as_strided

In [3]: may_a = numpy.array([False, True, False, True, True, False,
   ...:                      True, False, True, True, False])

In [4]: may_b = numpy.array([False,True,True,False])

In [5]: a = len(may_a)

In [6]: b = len(may_b)

In [7]: a_view = as_strided(may_a, shape=(a - b + 1, b),
   ...:                     strides=(may_a.dtype.itemsize,) * 2)

In [8]: a_view
Out[8]: 
array([[False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True, False,  True],
       [ True, False,  True, False],
       [False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False]], dtype=bool)

In [9]: numpy.where(numpy.all(a_view == may_b, axis=1))[0]
Out[9]: array([2, 7])

You have to be careful though, because even though a_view is a view of may_a's data, when comparing it with may_b a temporary array of (a - b + 1) * b is created, which may be a problem with large as and bs.

answered Feb 15, 2013 at 8:53

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

1 Comment

seberg Over a year ago

Maybe you appreciate pointing out small things... Not using .itemsize but .strides[0] is a bit less prone to failure, in case the array was sliced previously.

Bi Rico · Accepted Answer · 2013-02-16 03:07:12Z

This looks very similar to a string search problem. If you want to avoid implementing one these string search algorithms, you could abuse pythons built in string search, which is very fast, by doing something like:

# I've added [True, True, True] at the end.
may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False, True, True, True])
may_b = numpy.array([False,True,True,False])

may_a_str = may_a.tostring()
may_b_str = may_b.tostring()

idx = may_a_str.find(may_b_str)
out_index = []
while idx >= 0:
    out_index.append(idx)
    idx = may_a_str.find(may_b_str, idx+1)

This should work fine for boolean arrays. If you want to use this approach for another array type, you'll need to make sure the strides of the two arrays match and divide out_index by that stride.

You could also use the regular expression module instead of the loop to do the string search.

root · Accepted Answer · 2013-02-15 08:57:31Z

2

This should also work with other that boolean data:

In [1]: import numpy as np

In [2]: a = np.array([False, True, False, True, True, False, True, False, True, True, False])

In [3]: b = np.array([False,True,True,False])

In [4]: def get_indices(a, b):
   ...:     window = len(b)
   ...:     shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   ...:     strides = a.strides + (a.strides[-1],)
   ...:     w = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
   ...:     return np.where(np.all(np.equal(w,b),1) == True)[0]

In [5]: get_indices(a,b)
Out[5]: array([2, 7])

edited Feb 15, 2013 at 8:57

answered Feb 15, 2013 at 8:44

root

81.1k25 gold badges111 silver badges120 bronze badges

2 Comments

Olga Over a year ago

I changed an array a. >>> a=np.array([False,False]) >>> b=np.array([False,True,True,False]) >>> get_indices(a,b) >>> Out: ValueError: negative dimensions are not allowed

root Over a year ago

@Olga -- Yes, shape will be (-1, 4), you can add if len(a) < len(b): return np.array([]) to prevent that as in that case b cannot possibly be a subarray of a.

Konfle Dolex · Accepted Answer · 2013-02-15 08:07:09Z

1

I am not sure whether numpy provide a function for that. If it does not, here is a solution:

import numpy

def searchListIndexs(array, target):
    ret = []
    iLimit = len(array)-len(target)+1
    jLimit = len(target)
    for i in range(iLimit):
        for j in range(jLimit):
            if array[i+j] != target[j]:
                break
        else:
            ret.append(i)
    return ret


may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])
may_b = numpy.array([False,True,True,False])
out_index = searchListIndexs(may_a, may_b)
print out_index #If you are using Python 3, then use print(out_index) instead.

answered Feb 15, 2013 at 8:07

Konfle Dolex

4432 silver badges7 bronze badges

2 Comments

Konfle Dolex Over a year ago

Yep. :( This is a limitation of this approach.

Konfle Dolex Over a year ago

BTW, I guess that these is no faster algorithm than this. I guess it is necessary to iterate through the entire array because no sorting is possible in this case.

Collectives™ on Stack Overflow

Return the indexes of a sub-array in an array

5 Answers 5

5 Comments

1 Comment

Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

1 Comment

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related