Numpy: check if 1-d array is sub-array of another

Question

given two general numpy 1-d arrays (no guarantees about values whatsoever), I need to check if one is a sub-array of the other.

It's short and easy to do by casting to strings, but probably not the most efficient:

import numpy as np

def is_sub_arr(a1, a2):
    return str(a2).strip('[]') in str(a1).strip('[]')

arr1 = np.array([9, 1, 3, 2, 7, 2, 7, 2, 8, 5])
arr2 = np.array([3, 2, 7, 2])
arr3 = np.array([1,3,7])

print(is_sub_arr(arr1,arr2))  # True
print(is_sub_arr(arr1,arr3))  # False

is there an efficient builtin/native numpy way to do this?

note: using arr2 in arr1 will of course not work, because it's True only if the 1-d array arr2 is a an element in a 2-d array — Adam.Er8
– Adam.Er8, Commented Jul 12, 2019 at 9:31
@Chris: yours returns True on set([4, 5, 7]).issubset([7, 6, 4, 5, 9, 8]) — Andy L.
– Andy L., Commented Jul 12, 2019 at 9:44
While reviewing suggested edits, please pay attention on copied content for tags. You approved many of them but they should have been rejected — Vega
– Vega, Commented Mar 30, 2022 at 13:31

Stef · Accepted Answer · 2019-07-12 12:56:11Z

4

Although there is already an accepted answer, I'd like to throw in my quick and dirty solution:

memoryview(arr2).tobytes() in memoryview(arr1).tobytes()

This of course only works if the arrays use contiguous memory, so the full function is:

def is_sub_arr_mem(a, sub):
    if a.flags['FORC'] and sub.flags['FORC']: 
        return memoryview(sub).tobytes() in memoryview(a).tobytes()
    return None

This is way faster than the short and easy string conversion in the OP and also faster than the (non-numba accelerated) stride and the cut solutions for different array sizes:

Original data (n1 = 10, n2 = 4)
str:     0.142 ms
stride:  0.034 ms
cut:     0.014 ms
mem:     0.003 ms

n1 = 1000, n2 = 4
str:     3.072 ms
stride:  0.100 ms
cut:     0.017 ms
mem:     0.008 ms

n1 = 1000, n2 = 500
str:     4.543 ms
stride:  0.339 ms
cut:     0.018 ms
mem:     0.009 ms

(n1 and n2 are the sizes of arr1 and arr2, respectively)

edited Jul 12, 2019 at 12:56

answered Jul 12, 2019 at 12:23

Stef

30.9k3 gold badges34 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel F Over a year ago

Another similar method would be to use .view(dtype = np.void) tacked on to the strided answer, which would be nearly as efficient. Basically with any sort of memory view you can avoid the big boolean array of @jdehesa 's strided answer and cut down the solution time significantly without resorting to numba

javidcf · Accepted Answer · 2019-07-12 09:56:04Z

2

EDIT: You can also make things a lot faster (like 1000x) with less memory using Numba:

import numpy as np
import numba as nb

def is_sub_arr_np(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

@nb.jit(parallel=True)
def is_sub_arr_nb(a1, a2):
    for i in nb.prange(len(a1) - len(a2) + 1):
        for j in range(len(a2)):
            if a1[i + j] != a2[j]:
                break
        else:
            return True
    return False

# Test
np.random.seed(0)
arr1 = np.random.randint(100, size=100_000)
arr2 = np.random.randint(100, size=1_000)
print(is_sub_arr_np(arr1, arr2))
# False
print(is_sub_arr_nb(arr1, arr2))
# False

# Now enforce a match at the end
arr1[-len(arr2):] = arr2
print(is_sub_arr_np(arr1, arr2))
# True
print(is_sub_arr_nb(arr1, arr2))
# True

# Timing
%timeit is_sub_arr_np(arr1, arr2)
# 99.4 ms ± 567 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit is_sub_arr_nb(arr1, arr2)
# 124 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Not sure this is the most efficient answer but it is one possible solution:

import numpy as np

def is_sub_arr(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])
print(is_sub_arr(arr1, arr2))
# True
print(is_sub_arr(arr1, arr3))
# False

edited Jul 12, 2019 at 9:56

answered Jul 12, 2019 at 9:39

javidcf

59.9k7 gold badges87 silver badges134 bronze badges

4 Comments

Adam.Er8 Over a year ago

I went down that road, the problem is that it creates a pretty massive object (e.g. for arr1 of length 1000 and arr2 of length 500, we'll get a ~(500,500) array (25k elements).

javidcf Over a year ago

@Adam.Er8 Yes, it's not really a "smart" solution. I edited the answer to compare with Numba, which can be extremely efficient.

Adam.Er8 Over a year ago

Wow, the Numba one is seriously fast! I've never worked with it... gonna do some reading :) Thanks!

Divakar Over a year ago

Yup, seems like numba is the go-to-one for performance oriented ones.

neko · Accepted Answer · 2019-07-12 10:05:37Z

1

You can try something like this:

import numpy as np

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 20, 67, -1])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])


def is_sub_arr(original, sub):
    first_match = np.where(original == sub[0])
    if len(first_match) == 0:
        print("no matches")
        return

    else:
        for match in first_match[0]:

            cut_original = original[match:match + len(sub)]
            if np.all(cut_original == sub):
                print("Got a match at index ", match)
            else:
                print("no match")
        return


is_sub_arr(arr1, arr2)
is_sub_arr(arr1, arr3)

First, we check if the first element of the subarray is in the original array and get its index with np.where. Then, for every match, we cut the original array starting at that index and ending at that index plus the length of the sub, and check if this matches the subarray itself.

answered Jul 12, 2019 at 10:05

neko

3891 silver badge5 bronze badges

1 Comment

Adam.Er8 Over a year ago

awesome approach! quickly getting the potential index for matches and then using exact array matches by length, love this :)

Collectives™ on Stack Overflow

Numpy: check if 1-d array is sub-array of another

3 Answers 3

1 Comment

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related