12

What is the simplest way to find a byte[] inside another byte[]? i have a feeling i could do it with linq but i dont know how.

Note: I did a search with the [c#] and didnt find anything, i am surprised.

2
  • I think we need more information. Are you trying to find a subsequence of bytes within a byte array? Could you give an example? Commented Feb 1, 2011 at 4:56
  • 3
    See, for example, the Knuth-Morris-Pratt algorithm. Commented Feb 1, 2011 at 4:58

6 Answers 6

30

Here's a faster version of Ergwun's excellent answer:

static int SearchBytes( byte[] haystack, byte[] needle ) {
    var len = needle.Length;
    var limit = haystack.Length - len;
    for( var i = 0;  i <= limit;  i++ ) {
        var k = 0;
        for( ;  k < len;  k++ ) {
            if( needle[k] != haystack[i+k] ) break;
        }
        if( k == len ) return i;
    }
    return -1;
}

In a brief test with an 11MB haystack and 9 byte needle, this was about three times faster.

The optimizations are:

  • No function call each time through the outer loop.
  • Needle length and search limit are cached.
  • Redundant length test at the beginning of match() is removed.

Of course for long byte arrays you'd want to use something like a Boyer-Moore search, but for many purposes a simple algorithm like this is good enough, and it has the virtue of being short and easy to understand and verify.

Sign up to request clarification or add additional context in comments.

4 Comments

Why would the decision of whether to use BM or not would depend on the size of the input rather than the alphabet set ?
Good point! I was just speaking in general terms: a slow but simple and easy to understand algorithm can be good enough if either your data is small (by whatever measure) or if it really isn't that important how long it takes to run. For example, I had a case where I had to do a similar byte array search as part of a build process. With a simple algorithm like the one above, the search took about a second. Obviously this would be a deal-breaker in many situations, but this was part of a release build that took several minutes and was run only occasionally. So simple and slow was plenty good!
How is this faster? Is it not doing the same naive search?
@Goku Yes, it is the same algorithm, but a faster implementation because of the three optimizations I listed in the answer.
9

Here's a simple (naive?) way to do it:

static int search(byte[] haystack, byte[] needle)
{
    for (int i = 0; i <= haystack.Length - needle.Length; i++)
    {
        if (match(haystack, needle, i))
        {
            return i;
        }
    }
    return -1;
}

static bool match(byte[] haystack, byte[] needle, int start)
{
    if (needle.Length + start > haystack.Length)
    {
        return false;
    }
    else
    {
        for (int i = 0; i < needle.Length; i++)
        {
            if (needle[i] != haystack[i + start])
            {
                return false;
            }
        }
        return true;
    }
}

8 Comments

Perfect, just as i needed. To bad i cant do this with linq or something built in. Did you just write this now? or copy/pasted it from somewhere?
Note that depending on the input, this is potentially very slow.
@acidzombie - Just wrote it. @Jason - yeah can be slow, but simple.
@jason: Why? I dont see anything 'slow' about it?
@acidzombie24: It's really easy to come up with examples where it's ridiculously slow. You can make it repeatedly start a long search through the match portion of the algorithm, and then barely fail, and then have to start over all again.
|
0

Try this one with using lambda expressions:

private bool CheckPatternInArray(byte[] array, byte[] pattern)
{
    int fidx = 0;
    int result = Array.FindIndex(array, 0, array.Length, (byte b) =>
            {
                fidx = (b == pattern[fidx]) ? fidx + 1 : 0;
                return (fidx == pattern.Length);
            });
    return (result >= pattern.Length - 1);
}

If you are after the fastest one, check solutions here.

Comments

0

Appeciating this is an old question, but since it's still absent from LINQ despite being a common scenario, I've added a LINQ extension method below based on Michael's answer. Written in the spirit of a string/byte[] IndexOf.

It also explicitly handles an empty needle set, whereas the previous solution was returning a match (index 0), this is now returned as missing (index -1).

public static class LinqExtensions
{
    public static int IndexOf(this IEnumerable<byte> haystack, IEnumerable<byte> needle)
    {
        var needleArray = needle as byte[] ?? needle.ToArray();
        var haystackArray = haystack as byte[] ?? haystack.ToArray();

        var needleLength = needleArray.Length;
        var haystackLengthLimit = haystackArray.Length - needleLength;

        if (needleLength > 0)
        {
            for (var i = 0; i <= haystackLengthLimit; i++)
            {
                var j = 0;
                for (; j < needleLength; j++)
                {
                    if (needleArray[j] != haystackArray[i + j])
                        break;
                }

                if (j == needleLength)
                    return i;
            }
        }

        return -1;
    }
}

Plus some tests to show it in action..

    [Test]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {1, 3}, -1)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {}, -1)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {1}, 0)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {2, 3}, 1)]
    [TestCase(new byte[] { 1, 2, 3, 20, 30, 40}, new byte[] {20, 30, 40}, 3)]
    [TestCase(new byte[] { 1, 2}, new byte[] {1, 2, 3}, -1)]
    [TestCase(new byte[] { }, new byte[] {1, 2, 3}, -1)]
    [TestCase(new byte[] { }, new byte[] {}, -1)]
    public void TestIndexOf(byte[] haystack, byte[] needle, int expectedIndex)
    {
        Assert.That(haystack.IndexOf(needle), Is.EqualTo(expectedIndex));
    }

Comments

0
byte[] any = { 0xff, 0x14, 0x1f, 0x13, 0x12, 0x2f, 0x3f, 0x4f, 0x5f, 0x6f, 0x11, 0x22, 0x23 };
byte[] pattern = { 0x4f, 0x5f, 0x6f };
string anyHexString = BitConverter.ToString(any).Replace("-", "");
string patternHexString = BitConverter.ToString(pattern).Replace("-", "");
int findIndex = anyHexString.IndexOf(patternHexString) / 2;
Console.WriteLine(findIndex);

If you don't care about performance, you can use this method, which is almost the most concise and clear

Convert the byte array to hexString and lookup the

Comments

-2

you probably could have figured this yourself but sometimes I like to do the simple thing.

bool found = false;
int i = 0;
for(; i < byteArray.Length || found; i++)
{
  if(byteArray[i] == lookingFor)
  {
    found = true;
  }
}

2 Comments

I think you misunderstood the question. Think of the question as finding a word in a string, but the word is a byte[] and the string is another byte[].
yeah i read it as byte in a byte array. my bad. if you have ascii, you could use ASCIIEncoding.ASCII.GetString to make a string from your byte[]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.