4

I'm trying to continuously find a byte array (byte[]) within a byte array and I found a code that only finds the first occurrence.

This is where I found the code: Find an array (byte[]) inside another array?

Question: How can I continuously find a byte array with this code below?

        public int SearchBytes(byte[] haystack, byte[] needle)
    {
        int len = needle.Length;
        int limit = haystack.Length - len;
        for (int i = 0; i <= limit; i++)
        {
            int k = 0;
            for (; k < len; k++)
            {
                if (needle[k] != haystack[i + k]) break;
            }
            if (k == len) return i;
        }
        return -1;
    }
4
  • Well, what have you tried? If you want to return a List<int> of matches, can't you just change the return i to matches.Add(i) having declared a List<int> matches = new List<int>() to start with? Commented Jan 21, 2016 at 13:06
  • By continuously find you mean to find all occurences of a byte sequence withing another byte array? Commented Jan 21, 2016 at 13:06
  • @Gnqz Yes that's what I mean. Commented Jan 21, 2016 at 13:13
  • 1
    Have a look at the algorithm in this thread, with the corrected answer. It uses the Boyer-Moore-Horspool algorithm, which is very fast. Commented Jan 21, 2016 at 13:14

2 Answers 2

2

You can change the method to accept a start index like this:

public int SearchBytes(byte[] haystack, byte[] needle, int start_index)
{
    int len = needle.Length;
    int limit = haystack.Length - len;
    for (int i = start_index; i <= limit; i++)
    {
        int k = 0;
        for (; k < len; k++)
        {
            if (needle[k] != haystack[i + k]) break;
        }
        if (k == len) return i;
    }
    return -1;
}

The difference is simply that this method accepts a start_index and starts the search at this specific index.

Now, you can use it like this:

byte[] haystack = new byte[] { 1, 2, 3, 4, 5, 1, 2, 3 };

byte[] needle = new byte[] {1,2,3};

int index = 0;

while (true)
{
    index = SearchBytes(haystack, needle, index);

    if (index == -1)
        break;

    Console.WriteLine("Found at " + index);

    index += needle.Length;
}

This loop starts on index 0, then it uses the result of the previous search to set a new index to start the next search.

It adds needle.Length to the index so that we start searching immediately after the end of the previously found result.

UPDATE:

Here is how this code can be used to create a method that returns the indexes as an array:

public int[] SearchBytesMultiple(byte[] haystack, byte[] needle)
{
    int index = 0;

    List<int> results = new List<int>();

    while (true)
    {
        index = SearchBytes(haystack, needle, index);

        if (index == -1)
            break;

        results.Add(index);

        index += needle.Length;
    }

    return results.ToArray();
}

And it can be used like this:

byte[] haystack = new byte[] { 1, 2, 3, 4, 5, 1, 2, 3 };

byte[] needle = new byte[] {1,2,3};

int[] indexes = SearchBytesMultiple(haystack, needle);
Sign up to request clarification or add additional context in comments.

7 Comments

I tried your answer and its throwing an exception: Index was outside the bounds of the array.
which line throws the exception? Did you take the method code as is? There are two pieces of changes in it.
This is the exception: if (needle[k] != haystack[i + k]) break; I took the 'int index = 0' and placed it outside the void (which is a button) and i got an exception, otherwise if i left it as is, it won't find another occurrence.
Why did you move the int index = 0? Is it now an instance variable? How does the code look now?
It works now, I had to delete the while (true). I'm not sure why that fixed it. Also I had to move the index because it was resetting it to 0 every time you click the button. Here is the fixed version: byte[] haystack = new byte[] { 1, 2, 1, 1, 5, 1, 1, 1 }; byte[] needle = new byte[] { 1 }; index = SearchBytes(haystack, needle, index); Console.WriteLine("Found at " + index); index += needle.Length;
|
0

As an alternative, you could consider using the Boyer-Moore algorithm, which is extremely performant if the size of needle[] or haystack[] is large.

However, I wouldn't recommend this for very short needle[] or haystack[] because the overhead of setting up the offset tables will be higher than just doing a simple linear search.

Here's an implementation that I converted from the Java one on the Wiki page that I linked:

using System;
using System.Collections.Generic;

namespace ConsoleApplication1
{
    public sealed class BoyerMoore
    {
        readonly byte[] needle;
        readonly int[]  charTable;
        readonly int[]  offsetTable;

        public BoyerMoore(byte[] needle)
        {
            this.needle      = needle;
            this.charTable   = makeByteTable(needle);
            this.offsetTable = makeOffsetTable(needle);
        }

        public IEnumerable<int> Search(byte[] haystack)
        {
            if (needle.Length == 0)
                yield break;

            for (int i = needle.Length - 1; i < haystack.Length;)
            {
                int j;

                for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
                {
                    if (j != 0)
                        continue;

                    yield return i;
                    i += needle.Length - 1;
                    break;
                }

                i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
            }
        }

        static int[] makeByteTable(byte[] needle)
        {
            const int ALPHABET_SIZE = 256;
            int[] table = new int[ALPHABET_SIZE];

            for (int i = 0; i < table.Length; ++i)
                table[i] = needle.Length;

            for (int i = 0; i < needle.Length - 1; ++i)
                table[needle[i]] = needle.Length - 1 - i;

            return table;
        }

        static int[] makeOffsetTable(byte[] needle)
        {
            int[] table = new int[needle.Length];
            int lastPrefixPosition = needle.Length;

            for (int i = needle.Length - 1; i >= 0; --i)
            {
                if (isPrefix(needle, i + 1))
                    lastPrefixPosition = i + 1;

                table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
            }

            for (int i = 0; i < needle.Length - 1; ++i)
            {
                int slen = suffixLength(needle, i);
                table[slen] = needle.Length - 1 - i + slen;
            }

            return table;
        }

        static bool isPrefix(byte[] needle, int p)
        {
            for (int i = p, j = 0; i < needle.Length; ++i, ++j)
                if (needle[i] != needle[j])
                    return false;

            return true;
        }

        static int suffixLength(byte[] needle, int p)
        {
            int len = 0;

            for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j)
                ++len;

            return len;
        }
    }
}

Here's some test code:

byte[] haystack = new byte[1000];
byte[] needle   = {1, 2, 3, 4, 5, 6, 7, 8, 9};

for (int i = 100; i <= 900; i += 100)
    Array.Copy(needle, 0, haystack, i, needle.Length);

var searcher = new BoyerMoore(needle);

foreach (int index in searcher.Search(haystack))
    Console.WriteLine(index);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.