1

For an excercisement i had to look for a certain byte pattern within an byte array, easy enough but i wonder if the code can be simplified or even be optimized:

package anti_virus;

import java.nio.file.Files;
import java.nio.file.Paths;

public class Main {

    public static void main(String[] args) throws Exception {
        byte[] virus = Files.readAllBytes(Paths.get("C:/Users/Nick/Desktop/Uni/infected.com"));

        byte[] payload = new byte[]{0x56, 0x69, 0x72, 0x75, 0x73, (byte)0xB4, 0x40, (byte) 0xBB, 0x01,
                0x00, (byte) 0xB9, 0x05, 0x00, (byte) 0xBA, 0x0, 0x0, (byte) 0xCD, 0x21};

        // payload[14] and payload[14] have varying values

        for (int i = 0; i < virus.length; i++) {
            if ((virus[i] == payload[0]) && (virus[i+1] == payload[1]) && (virus[i+2] == payload[2]) &&
                (virus[i+3] == payload[3]) && (virus[i+4] == payload[4]) && (virus[i+5] == payload[5]) &&
                (virus[i+6] == payload[6]) && (virus[i+7] == payload[7]) && (virus[i+8] == payload[8]) &&
                (virus[i+9] == payload[9]) && (virus[i+10] == payload[10]) && (virus[i+11] == payload[11]) &&
                (virus[i+12] == payload[12]) && (virus[i+13] == payload[13]) && (virus[i+16] == payload[16]) &&
                (virus[i+17] == payload[17])) {
                  System.out.println("This file is probably a Virus!");
                  return;
            }
        }

        System.out.println("This file is no Virus.");
    }
}

4 Answers 4

3

Yes, it can be simplified/optimized:

  • You could use the KMP algorithm (first 14 bytes). This algorithm runs in O(payload.length + virus.length) for arbitrary payload instead of O(payload.length * virus.length). (Your code is working more efficiently than O(payload.length * virus.length) for only one reason: 0x56 occurs only as the first element of your array)
  • Even if you choose to keep your algorithm, I'd use a loop to make the code shorter & more readable. I'd also fix the source of ArrayIndexOutOfBoundsException in your loop (you may access indices i, ..., i+13, i+16, i+17 of the virus array and your loop condition allows i to get as large as virus.length-1).
Sign up to request clarification or add additional context in comments.

Comments

1

Your code is quite good, It gives a reasonable 21 ms on a non-virus 6 MB file. But I found it better to either make some pre-loop for the first 14 bytes. Moreover, you have to take care at the ending bytes.

begin = System.currentTimeMillis();
for (i = 0; i < virus.length-payload.length; i++) {
    for (j = 0; j < 14; j++) {
        // payload[14] and payload[15] have varying values
        if (virus[i+j] != payload[j]) {
            bFound = false;
            break;
        }
    }
    if ((bFound) && (virus[i+16] == payload[16]) && (virus[i+17] == payload[17])) {
        end = System.currentTimeMillis();
        System.out.println("time : "+(end-begin)+" ms");
        System.out.println("This file is probably a Virus!");
        return;
    }
}
end = System.currentTimeMillis();
System.out.println("time : "+(end-begin)+" ms");
System.out.println("This file is not a Virus.");

This first optim give a a reasonnable 14 ms (-33% of CPU).

Another optimization if you can afford reading your file as integer, is to make wide comparison (4 bytes) at a time. You should pad also the payload to a multiple of 4.

begin = System.currentTimeMillis();
for (i = 0; i < virusInt.length-payloadInt.length; i++) {
    if ((virusInt[i] == payloadInt[0]) && 
        (virusInt[i+1] == payloadInt[1]) && 
        (virusInt[i+2] == payloadInt[2]) &&
        ((virusInt[i+3]&0xFFFF0000) == payloadInt[3]) && 
        ((virusInt[i+4]&0xFFFF0000) == payloadInt[4])) {
           end = System.currentTimeMillis();
           System.out.println("time : "+(end-begin)+" ms");
           System.out.println("This file is probably a Virus!");
           return;
       }
}
end = System.currentTimeMillis();
System.out.println("time : "+(end-begin)+" ms");
System.out.println("This file is not a Virus.");

This give me an even more reasonable 2 ms (-90% of CPU). Of course, I do not count the time to transform into array of int as I suppose you load as array of int and your payload is an array of int too. I have not tried with long (which are 64 bits in JAVA) but it could be a little bit faster.

Comments

0

(Here I assume that virus is the virus signature, and payload any data. I might be wrong seeing your code.)

One has to walk the payload array for paylöadIndex in [0, payload.length - virus.length] (!) and at every step check the virus array again, in a for-loop, with a virusIndex.

Problem solution strategy. Think how you would do it on paper. You would shift the virus array over the payload array.

Comments

0

Something like this would check for the signature anywhere within the array, it has not been thoroughly tested though

public static void main(String[] args) throws Exception {
    byte[] virus = FileUtil.readBytes(new File("c:/x.txt"));
    byte[] payload = "def".getBytes();

    for (int i = 0; i < virus.length; i++) {
        if ((i + payload.length) <= virus.length) {
            boolean found = true;
            for (int j = 0; j < payload.length; j++) {
                if (virus[i + j] != payload[j]) {
                    found = false;
                    break;
                }
            }

            if (found) {
                System.out.println("This file is probably a Virus!");
                return;
            }
        } else {
            break;
        }
    }

    System.out.println("This file is no Virus.");
}

1 Comment

The OP is skipping the indices 14 and 15

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.