3

Oke now obviously I can check if a byte array contains only 1 value but I do not know if this is the fastest way to do it. The problem is sometimes i will get a byte array with only FF (255) values, if this happens i need to ignore it in the code which comes next, so the thing I've done is as follows:

var onlyFF = true;
foreach(var value in programCode)
{
    if (value != 0xFF)
    {
        onlyFF = false;
        break;
    }
}

But is this the fastest way? I'm going to have to check a tremendous amount of array's (all arrays are quite small though (350) )

So is this the fastest way or are there better ways for doing this?

4
  • 3
    Be sure not to fall into premature optimisation. Your code is fine, it does exactly what it's supposed to do and it should be fine until proven non-performant. Commented May 27, 2016 at 9:47
  • When the arrays aren't sorted in any way, you will have no chance but to check every byte. So what you wrote is the way to go... if you only want to check each array once you can lengthen the array to a multiple of 8, combine the bytes to a ulong and XOR all the values and just check the final value once, but i doubt that this would be any faster. Commented May 27, 2016 at 9:48
  • 2
    Btw. if you don't make a difference whether it's empty or contains only 0xff, your code is a one-liner: var empty = programCode.Any(value => value != 0xff). (means "Does the array contain anything else than 0xff"). Do some performance tests to compare both :) Commented May 27, 2016 at 9:56
  • 1
    Hand-optimized code that pays attention to memory organization looks like this. Adding such code to your program, well, the need to want that pretty badly. Commented May 27, 2016 at 10:52

3 Answers 3

4

There certainly are faster ways to optimize the specific check that you are performing. The real question as some of the comments has already indicated is whether it is truly necessary? Whether it is worthwhile to optimize the query really depends on a couple of question you will have to ask yourself first.

  1. What is your performance criteria?

    How many arrays a second should you be able to process? If the answer is 1000 or less, then I would definitely not bother with trying to optimize the code. If the answer is millions of arrays a second, then you might want to consider doing some performance testing of your code.

  2. What type of data do you expect?

    If 99% of the buffers you process are valid (not all 0xFF bytes) then your loop will most likely exist out on the first few checks in most cases. Does it make sense to optimize your algorithm for the worst case scenario if it only applies to 1% of your work load.

  3. What risks do I introduce to your code by changing the comparison method and does the benefits outweigh the risk?

A common optimization technique that Adwaenyth mentioned can be applied to your situation. You can treat your byte array as an array of long and then use the XOR bit logic operator to compare 8 bytes at a time. In order to make use of this method efficiently without the need to copy buffers you would have to make use of unsafe code. The following example shows a quick and dirty implementation of how this can be done (note I have not tested this code so please don’t use without proper testing first):

    public static bool IsBufferValidUnSafeXOR(byte[] buffer)
    {
        bool isValid = false;

        int byteLength = buffer.Length;
        int base64Length = byteLength >> 3;  // same as -- > (int)(byteLength / 8);
        int remainingBytes = byteLength - (base64Length << 3);
        ulong toggleMask = 0xFFFFFFFFFFFFFFFF;

        unsafe 
        {
            fixed (byte* pByteBuffer = buffer)
            {
                ulong* pBuffer = (ulong*)pByteBuffer;
                int index = 0;

                while (index < base64Length)
                {
                    if ((pBuffer[index] ^ toggleMask) > 0)
                    {
                        isValid = true;
                        break;
                    }

                    index++;
                }

            }
        }

        // Check remainder of byte array
        if (!isValid)
        {
            int index = (base64Length << 3);

            while(index < byteLength)
            {
                if (buffer[index] != 0xFF)
                {
                    isValid = true;
                    break;
                }

                index++;
            }

        }

        return isValid;
    }

I ran a couple of performance comparison of your current non-optimized method and the optimized method. I execute each method in a loop checking the validity of 1.5 million buffers. For the first test only 5% of the buffers checked was invalid. The second check 33% of the buffers was invalid for the 3rd 50% and the 4th a 100%. The table below shows how the two methods compared:

---------------------------------------------------------------------------------------------
| Nr | Total Nr.        | Valid Buffer  | Invalid Buffer    | Std Method    | XOR Unsafe    |
|    | Buffers Checked  | Count         | Count             | Execution Time| Execution Time|
---------------------------------------------------------------------------------------------
| 1  | 1,500,00         | 1,425,000     | 75,000            | 183 ms        | 124 ms        |
---------------------------------------------------------------------------------------------
| 2  | 1,500,00         | 1,000,000     | 500,000           | 566 ms        | 226 ms        |
---------------------------------------------------------------------------------------------
| 3  | 1,500,00         | 750,000       | 750,000           | 800 ms        | 259 ms        |
---------------------------------------------------------------------------------------------
| 4  | 1,500,00         | 0             | 1,500,000         | 1574 ms       | 431 ms        |
---------------------------------------------------------------------------------------------

From the above table we can see that while the unsafe (XOR) method is faster the speed difference is insignificant if only 5% of the buffers checked was invalid, while the biggest performance improvement is obtained if a 100% of the buffers are invalid. Which brings us back to the original question is it truly worthwhile to optimize the code?

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the effort you put in, this should be appreciated more.
1

A fairly easy way to make this faster is to obtain a ulong* to the array and compare 8 byte chunks at a time with 0xFFFFFFFFFFFFFFFFUL. You will likely need to handle misalignment at the start and end of the array by comparing byte-wise there.

You can then unroll the loop maybe 4 times to reduce loop overhead to almost nothing. It will be hard (but possible) to do it faster than that.

Another fairly easy option is to write this in C and PInvoke. C compilers have sophisticated ways of making this fast. The .NET JIT does not. Although I'm surprised that neither GCC nor LLVM do any particular tricks here.

Playing with different code patterns LLVM provides the following optimization:

if (array[i + 0] & array[i + 1] & array[i + 2] & array[i + 3] == 0xFF)
 return true;

That saves a lot of instructions and branches.

Comments

0

for me it sounds like a parallelizable problem. if you have hundreds of these arrays, with hundreds bytes inside, I would think about using the GPU

you can use CUDA "only working on Nvidia cards" or OpenCL "working on all cards" to solve this task.

for c# there is a good lib (for OpenCL) called cloo which is easy to use

1 Comment

btw: doing it on the GPU can be faster, but only if you have enough stuff to check... because the calls to and from GPU is a little bit timeexpensive. To load the arrays to the GPU is also expensive, but I think you want to do more than this checking at the end

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.