3

Task

I have a huge file (≈ 20 GB) containing integers and want to read them in C#.

Simple approach

Reading the file to memory (into a byte-array) is quite fast (using an SSD, the whole file fits into memory). But when I read these bytes with a binary-reader (via memory-stream) and the ReadInt32-method takes significantly longer than reading the file to memory. I expected to be disk-IO the bottleneck, but it's the conversion!

Idea and question

Is there a way to directly cast the whole byte-array into an int-array not having to convert it one-by-one with the ReadInt32-method?

class Program
{
    static int size = 256 * 1024 * 1024;
    static string filename = @"E:\testfile";

    static void Main(string[] args)
    {
        Write(filename, size);
        int[] result = Read(filename, size);
        Console.WriteLine(result.Length);
    }

    static void Write(string filename, int size)
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        BinaryWriter bw = new BinaryWriter(new FileStream(filename, FileMode.Create), Encoding.UTF8);
        for (int i = 0; i < size; i++)
        {
            bw.Write(i);
        }
        bw.Close();
        stopwatch.Stop();
        Console.WriteLine(String.Format("File written in {0}ms", stopwatch.ElapsedMilliseconds));
    }

    static int[] Read(string filename, int size)
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        byte[] buffer = File.ReadAllBytes(filename);
        BinaryReader br = new BinaryReader(new MemoryStream(buffer), Encoding.UTF8);
        stopwatch.Stop();
        Console.WriteLine(String.Format("File read into memory in {0}ms", stopwatch.ElapsedMilliseconds));
        stopwatch.Reset();
        stopwatch.Start();

        int[] result = new int[size];

        for (int i = 0; i < size; i++)
        {
            result[i] = br.ReadInt32();
        }
        br.Close();
        stopwatch.Stop();
        Console.WriteLine(String.Format("Byte-array casted to int-array in {0}ms", stopwatch.ElapsedMilliseconds));

        return result;
    }
}
  • File written in 5499ms
  • File read into memory in 455ms
  • Byte-array casted to int-array in 3382ms
5
  • 3
    You'll have to perform the conversion eventually. Can you just read the array into memory and use BitConverter to get the values from the array as needed? Commented Nov 2, 2014 at 14:36
  • Possible duplicate of stackoverflow.com/questions/3206391/…. Commented Nov 2, 2014 at 14:37
  • @PatrickHofman: Seems he already knows how to read the file into memory. Commented Nov 2, 2014 at 14:43
  • Golly 20GB might be a lot to read in one go. Do you need all of it in one sitting. Otherwise my first thought would have been memory-mapped files but then that's unmanaged code by default. Commented Nov 2, 2014 at 14:48
  • People are a bit confused. Why don't you show us the code you have so far? Commented Nov 2, 2014 at 14:50

1 Answer 1

5

You could allocate a temporary byte[] buffer with convenient size and use the Buffer.BlockCopy method to copy bytes to the int[] array incrementally.

BinaryReader reader = ...;
int[] hugeIntArray = ...;

const int TempBufferSize = 4 * 1024 * 1024;
byte[] tempBuffer = reader.ReadBytes(TempBufferSize);
Buffer.BlockCopy(tempBuffer, 0, hugeIntArray, offset, TempBufferSize);

Where offset is a current (for the current iteration) starting index in the destination hugeIntArray array.

Sign up to request clarification or add additional context in comments.

3 Comments

ReadBytes is likely to suffer the same fate, though I'm not certain of that.
I read the whole file into memory at first with ReadAllBytes.
This is significantly faster: File read into memory in 439ms, Byte-array casted to int-array in 105ms

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.