Is there a faster way than BitConverter.ToInt32 to convert a byte array to an int value?
-
Sure, just ask google :D What programming language are you using? BitConverter seems to be .Net. My guess would be that reading the byte array might take more time then converting it, just add some extra information to that question.CodingBarfield– CodingBarfield2010-12-01 15:48:54 +00:00Commented Dec 1, 2010 at 15:48
-
4Why? Is it a performance bottleneck?Oded– Oded2010-12-01 15:49:42 +00:00Commented Dec 1, 2010 at 15:49
-
Sorry for my incompletely explanation.I'm use c#,I use BitConverter.ToInt32 converter a byte array to int,it almost use 200ms, is there any way faster?jacky– jacky2010-12-01 16:09:43 +00:00Commented Dec 1, 2010 at 16:09
-
Stopwatch sw = new Stopwatch(); sw.Start(); Dictionary<int, int> sd = new Dictionary<int, int>(); Console.WriteLine(sw.GetSplitTimeInMicroseconds()); int unitid = BitConverter.ToInt32(units, 7); // units is a byte array Console.WriteLine(sw.GetSplitTimeInMicroseconds()); // output 113.220279110632 383.590305626821jacky– jacky2010-12-01 16:25:01 +00:00Commented Dec 1, 2010 at 16:25
-
3@jacky - timing over 1 iteration is not a good measure. Do it 5 million times and divide the answer by 5MMarc Gravell– Marc Gravell2010-12-01 16:42:44 +00:00Commented Dec 1, 2010 at 16:42
6 Answers
I actually tried several different ways to convert four bytes to an int:
BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);b = new byte[] { w, x, y, z }; BitConverter.ToInt32(b, 0);b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z }; BitConverter.ToInt32(b, 7);w | (x << 8) | (y << 16) | (z << 24);b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
I ran 10^9 iterations of each one in a Release (x86) build not on under a debugger on a 2.5 GHz Core i7 laptop. Here are my results (note that the methods that don't use BitConverter are substantially faster):
test1: 00:00:15.5287282 67305985
test2: 00:00:15.1334457 67305985
test3: 00:00:08.0648586 67305985
test4: 00:00:11.2307059 67305985
test5: 00:00:02.0219417 67305985
test6: 00:00:01.6275684 67305985
Some conclusions you can draw:
- test1 shows that on my laptop it's hard to make the conversion go slower than 15ns, which I hate to say should be fast enough for anyone. (Do you need to call it more than 60M times per second?)
- test2 shows that using
uintinstead ofintsaves a small amount of time. I'm not sure why, but I think it's small enough to be experimental error. - test3 shows that the overhead of creating a new byte array (7ns) is as nearly as much as calling the function, but is still faster than making a new array out of the old array.
- test4 shows that making unaligned array accesses from
ToInt32adds overhead (3ns) - test5 shows that pulling the 4 bytes from local variables and combining them yourself is several times faster than calling
ToInt32. - test6 shows that it's actually slightly faster to pull the 4 bytes from an array than from function arguments! I suspect this is due to CPU pipelining or cache effects.
The fastest, test6, took only twice as long to run as an empty loop (not shown). In other words, it took less than 1ns to perform each conversion. Good luck getting any useful calculation to go faster than that!
Here's my test program:
using System;
namespace BitConverterTest
{
class Program
{
const int iters = 1000000000;
static void Main(string[] args)
{
test1(1, 2, 3, 4);
test2(1, 2, 3, 4);
test3(1, 2, 3, 4);
test4(1, 2, 3, 4);
test5(1, 2, 3, 4);
test6(1, 2, 3, 4);
}
static void test1(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
res = BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
Console.WriteLine("test1: " + timer.Elapsed + " " + res);
}
static void test2(byte w, byte x, byte y, byte z)
{
uint res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < iters; i++)
res = BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
Console.WriteLine("test2: " + timer.Elapsed + " " + res);
}
static void test3(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
res = BitConverter.ToInt32(b, 0);
Console.WriteLine("test3: " + timer.Elapsed + " " + res);
}
static void test4(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
for (int i = 0; i < iters; i++)
res = BitConverter.ToInt32(b, 7);
Console.WriteLine("test4: " + timer.Elapsed + " " + res);
}
static void test5(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
res = w | (x << 8) | (y << 16) | (z << 24);
Console.WriteLine("test5: " + timer.Elapsed + " " + res);
}
static void test6(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
res = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
Console.WriteLine("test6: " + timer.Elapsed + " " + res);
}
}
}
6 Comments
byte[] before the timing, big enough that the timed loop is a small number of passes over it is likely to show a more real value by blowing out the data caches (this will lead tests 5 & 6 to lose much of their advantage).long is big enough) and the times go up considerably. This removes the common expression extraction from the loop in your code.If I remember correctly, that implementation uses unsafe code (treating a byte* as an int*), so it will be hard to beat, but the other alternative is shifting.
However, from lots of work in this area, this is so unlikely to be a genuine bottleneck as to be irrelevant. I/O is the main issue, typically.
GetBytes(int), however, is more expensive (in high volume) due to array / heap allocation.
Comments
Followup to Gabe's performance tests:
Changes:
- Eliminate tests 1 and 2, because the inline array creation made these tests of the GC (as can be seen from the Gen 0 GC performance counter).
- Eliminate test 4 (non-aligned array) to keep things simpler.
- Add tests 7 and 8 which do conversions from a large array (256 MB) via BitConverter and bit fiddling respectively.
- Add running total to tests to try and avoid common sub-expression elimination, which clearly lead to the low times in Gabe's tests 5 and 6.
Results:
32-bit option:
test3: 00:00:06.9230577 test5: 00:00:03.8349386 test6: 00:00:03.8238272 test7: 00:00:07.3898489 test8: 00:00:04.680739164-bit option:
test3: 00:00:05.8794322 test5: 00:00:00.4384600 test6: 00:00:00.4069573 test7: 00:00:06.2279365 test8: 00:00:03.5472486
Analysis
- Still getting common sub-expression elimination in 5 and 6 on 64-bit.
- For this 64 bit is a win. But such a micro-benchmark shouldn't be followed for choosing where to optimise an application.
- It looks like about a 50% improvement when converting 256 MB of random data into ints. As the test does it 16 times, that's less that 0.2s—unlikely to make a real difference outside a very narrow subset of applications, and then you have the additional maintenance cost of ensuring that someone doesn't break the code over the application lifetime.
- I wonder how much of the
BitConverteroverhead is the parameter checks it does? - Test 6 is only a little faster than 5. Clearly array bounds checks are being eliminated.
The Code
using System;
namespace BitConverterTest {
class Program {
const int iters = 1024*1024*1024;
const int arrayLen = iters/4;
static byte[] array = new byte[arrayLen];
static void Main(string[] args) {
//test1(1, 2, 3, 4);
//test2(1, 2, 3, 4);
test3(1, 2, 3, 4);
//test4(1, 2, 3, 4);
test5(1, 2, 3, 4);
test6(1, 2, 3, 4);
// Fill array with good PRNG data
var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
rng.GetBytes(array);
test7();
test8();
}
// BitConverter with aligned input
static void test3(byte w, byte x, byte y, byte z) {
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
res = BitConverter.ToInt32(b, 0);
Console.WriteLine("test3: " + timer.Elapsed + " " + res);
}
// Inline bitfiddling with separate variables.
static void test5(byte w, byte x, byte y, byte z) {
long res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++) {
int a = w | (x << 8) | (y << 16) | (z << 24);
res += a;
}
Console.WriteLine("test5: " + timer.Elapsed + " " + res);
}
// Inline bitfiddling with array elements.
static void test6(byte w, byte x, byte y, byte z) {
long res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++) {
int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
res += a;
}
Console.WriteLine("test6: " + timer.Elapsed + " " + res);
}
// BitConvert from large array...
static void test7() {
var its = iters/arrayLen * 4; // *4 to remove arrayLen/4 factor.
var timer = System.Diagnostics.Stopwatch.StartNew();
long res = 0;
for (var outer = 0; outer < its; outer++) {
for (var pos = 0; pos < arrayLen; pos += 4) {
var x = BitConverter.ToInt32(array, pos);
res += x;
}
}
Console.WriteLine("test7: " + timer.Elapsed + " " + res);
}
// Bitfiddle from large array...
static void test8() {
var its = iters/arrayLen * 4;
var timer = System.Diagnostics.Stopwatch.StartNew();
long res = 0;
for (var outer = 0; outer < its; outer++) {
for (var pos = 0; pos < arrayLen; pos += 4) {
int x = array[pos] | (array[pos+1] << 8) | (array[pos+2] << 16) | (array[pos+3] << 24);
res += x;
}
}
Console.WriteLine("test8: " + timer.Elapsed + " " + res);
}
}
}
Based on a quick review of the implementation of BitConverter.ToInt32 in .NET Reflector I would say "No".
It optimises for the case where the array is aligned and directly casts the bytes, otherwise it performs a bitwise merge.
3 Comments
I summarized all above, added a Span variant and used a benchmark framework.
public class ByteArrayToIntBench
{
private readonly byte[] _array = new byte[4 * 10_000];
public ByteArrayToIntBench()
{
Random r = new Random();
for (int i = 0; i < _array.Length; i++)
{
_array[i] = (byte)r.Next(byte.MinValue, byte.MaxValue);
}
}
[Benchmark]
public double Bitconverter()
{
double res = 0;
for (int i = 0; i < _array.Length; i += 4)
{
res += BitConverter.ToInt32(_array, i);
}
return res;
}
[Benchmark]
public unsafe double Unsafe()
{
double res = 0;
for (int i = 0; i < _array.Length; i += 4)
{
fixed (byte* pData = &_array[i])
{
res += *(int*)pData;
}
}
return res;
}
[Benchmark]
public double Shift()
{
double res = 0;
for (int i = 0; i < _array.Length; i += 4)
{
res += _array[i] | (_array[i + 1] << 8) | (_array[i + 2] << 16) | (_array[i + 3] << 24);
}
return res;
}
[Benchmark]
public double Span()
{
double res = 0;
for (int i = 0; i < _array.Length; i += 4)
{
res += MemoryMarshal.Cast<byte, int>(_array.AsSpan(i, 4))[0];
}
return res;
}
}
Comments
I have also fiddled with similar issues.
In my case it was how to convert to single precision floats when data is stored as double precision byte[]s, or just between the double representation and the byte[] representation etc. The best is not to go through too many API layers if one wants to achieve the best performance on large sets of data, and to embed as much info as you can into the algo as possible without making it too brittle or incomprehensible.
So, to further follow up from Richard's tests, I add another test below (test9) which is the way I've gone in my own work and answers his point 4 in his Analysis section:
Use unsafe memory pointer accessing to achieve the most performant result. Something that comes naturally if you use c++, but not necessarily c#. This is similar to what BitConverter is doing under the hood, but without the parameter and safety checks (as, of course, we know what we are doing... ;)
Results:
32-bit option:
test3: 00:00:06.2373138 test5: 00:00:03.1193338 test6: 00:00:03.1609287 test7: 00:00:07.7328020 test8: 00:00:06.4192130 test9: 00:00:03.959030764-bit option:
test3: 00:00:06.2209098 test5: 00:00:00.5563930 test6: 00:00:01.5486780 test7: 00:00:08.4858474 test8: 00:00:05.4991740 test9: 00:00:02.2928944
Here the same code, including the new test9:
using System;
namespace BitConverterTest
{
class Program
{
const int iters = 1024 * 1024 * 1024;
const int arrayLen = iters / 4;
static byte[] array = new byte[arrayLen];
static void Main(string[] args)
{
//test1(1, 2, 3, 4);
//test2(1, 2, 3, 4);
test3(1, 2, 3, 4);
//test4(1, 2, 3, 4);
test5(1, 2, 3, 4);
test6(1, 2, 3, 4);
// Fill array with good PRNG data
var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
rng.GetBytes(array);
test7();
test8();
test9();
}
// BitConverter with aligned input
static void test3(byte w, byte x, byte y, byte z)
{
int res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
res = BitConverter.ToInt32(b, 0);
Console.WriteLine("test3: " + timer.Elapsed + " " + res);
}
// Inline bitfiddling with separate variables.
static void test5(byte w, byte x, byte y, byte z)
{
long res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
{
int a = w | (x << 8) | (y << 16) | (z << 24);
res += a;
}
Console.WriteLine("test5: " + timer.Elapsed + " " + res);
}
// Inline bitfiddling with array elements.
static void test6(byte w, byte x, byte y, byte z)
{
long res = 0;
var timer = System.Diagnostics.Stopwatch.StartNew();
var b = new byte[] { w, x, y, z };
for (int i = 0; i < iters; i++)
{
int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
res += a;
}
Console.WriteLine("test6: " + timer.Elapsed + " " + res);
}
// BitConvert from large array...
static void test7()
{
var its = iters / arrayLen * 4; // *4 to remove arrayLen/4 factor.
var timer = System.Diagnostics.Stopwatch.StartNew();
long res = 0;
for (var outer = 0; outer < its; outer++)
{
for (var pos = 0; pos < arrayLen; pos += 4)
{
var x = BitConverter.ToInt32(array, pos);
res += x;
}
}
Console.WriteLine("test7: " + timer.Elapsed + " " + res);
}
// Bitfiddle from large array...
static void test8()
{
var its = iters / arrayLen * 4;
var timer = System.Diagnostics.Stopwatch.StartNew();
long res = 0;
for (var outer = 0; outer < its; outer++)
{
for (var pos = 0; pos < arrayLen; pos += 4)
{
int x = array[pos] | (array[pos + 1] << 8) | (array[pos + 2] << 16) | (array[pos + 3] << 24);
res += x;
}
}
Console.WriteLine("test8: " + timer.Elapsed + " " + res);
}
// unsafe memory operations from large array...
// (essentialy internals of BitConverter without param checks, etc)
static unsafe void test9()
{
var its = iters / arrayLen * 4;
var timer = System.Diagnostics.Stopwatch.StartNew();
long res = 0;
int value = 0;
for (var outer = 0; outer < its; outer++)
{
for (var pos = 0; pos < arrayLen; pos += 4)
{
fixed (byte* numPtr = &array[pos])
{
value = *(int*)numPtr;
}
int x = *(int*)&value;
res += x;
}
}
Console.WriteLine("test9: " + timer.Elapsed + " " + res);
}
}
}
