0

I'm searching a way to check if an array contains all the elements of another array. This is the situation: I have two bytes arrays Bytes(): one contains the bytes of a file, and another contains the bytes to compare.

For example, if the file contains these bytes: 4D 5A 90 00 03 and the string to compare is 00 03, I want the function to return true. Else it will obviously return false. So, all bytes in the string to compare must be present in the file too.

I've already searched on the web for this. Tried the old good Contains() function, but for arrays it works only to compare a single byte. You know, one byte only is too little to identify a file!

If possible, I'd like to do this as fast as possible.

I'm working in VB.NET WinForms, VS 2013, .NET 4.5.1

Thanks in advance,

FWhite

EDIT:

Now I have a List(Of Bytes()) like this:

00 25 85 69
00 41 52
00 78 96 32

These are three Bytes() arrays. How do I check if my file bytes array contains all of these values (the file must contains 00 25 85 69, 00 41 52 and 00 78 96 32? I've tried with this code, but it doesn't work:

Dim BytesToCompare As List(Of Byte()) = StringToByteArray(S.Split(":")(3))
    For Each B As Byte() In BytesToCompare 
        If FileBytes.All(Function(c) B.Contains(c)) Then
            'Contains
            TempResults.Add("T")
        Else
            TempResults.Add("F")
        End If
    Next
If CountResults(TempResults) Then
    Return S
    Exit For
End If

The code in CountResults is this:

Public Function CountResults(Input As List(Of String)) As Boolean
    Dim TrueCount As Integer = 0
    Dim FalseCount As Integer = 0
    Dim TotalCount As Integer = Input.Count
    For Each S In Input
        If S = "T" Then
            TrueCount = TrueCount + 1
        ElseIf S = "F" Then
            FalseCount = FalseCount + 1
        End If
    Next
    If TrueCount = TotalCount Then
        Return True
    ElseIf FalseCount > TrueCount Then
        Return False
    End If
End Function

Tell me if you didn't understand and I'll try to better explain.

Thank you,

FWhite

2 Answers 2

1

I was thinking that maybe something other than the brute-force method would work and discovered the Boyer-Moore search algorithm. Shamelessly translating the C and Java code found at Boyer–Moore string search algorithm into VB.NET, I arrived at

Public Class BoyerMooreSearch

    ' from C and Java code at http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

    Private Shared Function SuffixLength(needle As Byte(), p As Integer) As Integer
        Dim len As Integer = 0
        Dim j = needle.Length - 1
        Dim i = 0
        While i >= 0 AndAlso needle(i) = needle(j)
            i -= 1
            j -= 1
            len += 1
        End While

        Return len

    End Function

    Private Shared Function GetOffsetTable(needle As Byte()) As Integer()
        Dim table(needle.Length - 1) As Integer
        Dim lastPrefixPosition = needle.Length
        For i = needle.Length - 1 To 0 Step -1
            If Isprefix(needle, i + 1) Then
                lastPrefixPosition = i + 1
            End If
            table(needle.Length - 1 - i) = lastPrefixPosition - i + needle.Length - 1
        Next
        For i = 0 To needle.Length - 2
            Dim slen = SuffixLength(needle, i)
            table(slen) = needle.Length - 1 - i + slen
        Next

        Return table

    End Function

    Private Shared Function Isprefix(needle As Byte(), p As Integer) As Boolean
        Dim j = 0
        For i = p To needle.Length - 1
            If needle(i) <> needle(j) Then
                Return False
            End If
            j += 1
        Next

        Return True

    End Function

    Private Shared Function GetCharTable(needle As Byte()) As Integer()
        Const ALPHABET_SIZE As Integer = 256
        Dim table(ALPHABET_SIZE - 1) As Integer
        For i = 0 To table.Length - 1
            table(i) = needle.Length
        Next
        For i = 0 To needle.Length - 2
            table(needle(i)) = needle.Length - 1 - i
        Next

        Return table

    End Function

    Shared Function IndexOf(haystack As Byte(), needle As Byte()) As Integer
        If needle.Length = 0 Then
            Return 0
        End If

        Dim charTable = GetCharTable(needle)
        Dim offsetTable = GetOffsetTable(needle)

        Dim i = needle.Length - 1
        While i < haystack.Length
            Dim j = needle.Length - 1
            While j >= 0 AndAlso haystack(i) = needle(j)
                i -= 1
                j -= 1
            End While
            If j < 0 Then
                Return i + 1
            End If

            i += Math.Max(offsetTable(needle.Length - 1 - j), charTable(haystack(i)))

        End While

        Return -1

    End Function

End Class

And to test it (suspecting that the LINQ code presented by @OneFineDay would demolish it for performance):

Imports System.IO
Imports System.Text

Module Module1

    Dim bytesToCheck As List(Of Byte())
    Dim rand As New Random

    Function GetTestByteArrays() As List(Of Byte())
        Dim testBytes As New List(Of Byte())
        ' N.B. adjust the numbers used in CreateTestFile according to the quantity (e.g. 10) of testData used
        For i = 1 To 10
            testBytes.Add(Encoding.ASCII.GetBytes("ABCDEFgfdhgf" & i.ToString() & "sdfgjdfjFGH"))
        Next
        Return testBytes
    End Function

    Sub CreateTestFile(f As String)
        ' Make a 4MB file of test data

        ' write a load of bytes which are not going to be in the
        ' judiciously chosen data to search for...
        Using fs As New FileStream(f, FileMode.Create, FileAccess.Write)
            For i = 0 To 2 ^ 22 - 1
                fs.WriteByte(CByte(rand.Next(128, 256)))
            Next
        End Using

        ' ... and put the known data into the test data
        Using fs As New FileStream(f, FileMode.Open)
            For i = 0 To bytesToCheck.Count - 1
                fs.Position = CLng(i * 2 ^ 18)
                fs.Write(bytesToCheck(i), 0, bytesToCheck(i).Length)
            Next
        End Using

    End Sub

    Sub Main()

        ' the byte sequences to be searched for
        bytesToCheckFor = GetTestByteArrays()

        ' Make a test file so that the data can be inspected
        Dim testFileName As String = "C:\temp\testbytes.dat"
        CreateTestFile(testFileName)

        Dim fileData = File.ReadAllBytes(testFileName)

        Dim sw As New Stopwatch
        Dim containsP As Boolean = True

        sw.Reset()
        sw.Start()
        For i = 0 To bytesToCheckFor.Count - 1
            If BoyerMooreSearch.IndexOf(fileData, bytesToCheckFor(i)) = -1 Then
                containsP = False
                Exit For
            End If
        Next

        sw.Stop()

        Console.WriteLine("Boyer-Moore: {0} in {1}", containsP, sw.ElapsedTicks)

        sw.Reset()
        sw.Start()
        Dim temp As New List(Of Byte)
        Array.ForEach(bytesToCheckFor.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
        Dim result = fileData.All(Function(_byte) temp.Contains(_byte))
        sw.Stop()

        Console.WriteLine("LINQ: {0} in {1}", result, sw.ElapsedTicks)

        Console.ReadLine()

    End Sub

End Module

Now, I know that the byte sequences to match are in the test file (I confirmed that by using a hex editor to search for them) and, assuming (oh dear!) I am using the other method correctly, the latter doesn't work whereas mine does:

Boyer-Moore: True in 23913
LINQ: False in 3224

I did also test the first code example by OneFineDay for searching for small vs large patterns to match, and for less than seven or eight bytes that code was faster than Boyer-Moore. So, if you would care to test it for the size of data you're searching in and the size of the patterns you're looking for, Boyer-Moore might be a better fit to your "If possible, I'd like to do this as fast as possible."

EDIT

Further to the OP's uncertainty as to whether or not my suggested method works, here is a test with very small sample data:

Sub Test()
    bytesToCheckFor = New List(Of Byte())
    bytesToCheckFor.Add({0, 1}) ' the data to search for
    bytesToCheckFor.Add({1, 2})
    bytesToCheckFor.Add({0, 2})

    Dim fileData As Byte() = {0, 1, 2} ' the file data
    ' METHOD 1: Boyer-Moore
    Dim containsP As Boolean = True

    For i = 0 To bytesToCheckFor.Count - 1
        If BoyerMooreSearch.IndexOf(fileData, bytesToCheckFor(i)) = -1 Then
            containsP = False
            Exit For
        End If
    Next

    Console.WriteLine("Boyer-Moore: {0}", containsP)

    ' METHOD 2: LINQ
    Dim temp As New List(Of Byte)
    Array.ForEach(bytesToCheckFor.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
    Dim result = fileData.All(Function(_byte) temp.Contains(_byte))

    Console.WriteLine("LINQ: {0}", result)

    Console.ReadLine()

End Sub

Outputs:

Boyer-Moore: False
LINQ: True

Also, I renamed the variables in my original Main() method to hopefully make them more meaningful.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, Andrew! A great and detailed answer. I'll give it a try next days. Thank you again FWhite
Hi @Andrew! Since the code proposed by OneFineDay was a bit slow, I decided to try yours without good results. Your code searches for each single byte in the file. This is not correct: the file must contains an entire section of bytes. An example: we have a file with three bytes: 00 01 02. We have a string to compare, which is 00 02. Your code returns True because, actually, the file contains 00 and 02. The code should return False, because the bytes are not in row: in fact, in the middle, there is the 01 which invalidates the search pattern. Thanks again, FWhite
@PWhite I just tested my code with example file data of 00 01 02 and data to search of 00 02. Perhaps you have confused my version with the other version as mine works and the other doesn't, or I didn't give the variables meaningful enough names. I'll edit my answer to show how I tested it with your sample data.
1

You can use the All function to check for that. It returns a Boolean.

Dim orgByteArray() As Byte = {CByte(1), CByte(2), CByte(3)}
Dim testByteArray() As Byte = {CByte(1), CByte(2)}
Dim result = orgByteArray.All(Function(b) testByteArray.Contains(b))
'output for this case returns False

For comparing a List(Of Byte()) to a Byte() where the Byte() is the complte list of all sub arrays in the List(Of byte()).

Dim filebytes() As Byte = {CByte(1), CByte(2), CByte(3), CByte(3), CByte(4), CByte(5), CByte(6), CByte(7), CByte(8)}
Dim bytesToCheck As New List(Of Byte())
bytesToCheck.Add(New Byte() {CByte(1), CByte(2), CByte(3)})
bytesToCheck.Add(New Byte() {CByte(3), CByte(4), CByte(5)})
bytesToCheck.Add(New Byte() {CByte(6), CByte(7), CByte(8)})
Dim temp As New List(Of Byte)
Array.ForEach(bytesToCheck.ToArray, Sub(byteArray) Array.ForEach(byteArray, Sub(_byte) temp.Add(_byte)))
Dim result = filebytes.All(Function(_byte) temp.Contains(_byte))
'output = True

14 Comments

OneFineDay, what if I have a List(Of Bytes()), a list that contains some bytes arrays. How can I check if the file contains all bytes arrays contained in the main list of bytes array? Thank you so much FWhite
Drill down I used two All predicates, first says use All arrays then the second All checks the array as it checks all arrays.
Hi and thanks for your fast answer :) I'm going to update my post to better describe my problem, because I can't in 500 characters...
Should I write "FileBytes" instead of "byteList" (like the old example? Because it tells me an error: "All is not a member of Byte)
My example shows the syntax, use the names you want.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.