5

I have the following code fragment that reads a binary file and validates it:

 FileStream f = File.OpenRead("File.bin");
 MemoryStream memStream = new MemoryStream();
 memStream.SetLength(f.Length);
 f.Read(memStream.GetBuffer(), 0, (int)f.Length);
 f.Seek(0, SeekOrigin.Begin);
 var r = new BinaryReader(f);
 Single prevVal=0;
 do
 {
    r.ReadUInt32();
    var val = r.ReadSingle();
    if (prevVal!=0) {
       var diff = Math.Abs(val - prevVal) / prevVal;
       if (diff > 0.25)
          Console.WriteLine("Bad!");
    }
    prevVal = val;
 }
 while (f.Position < f.Length);

It unfortunately works very slowly, and I am looking to improve this. In C++, I would simply read the file into a byte array and then recast that array as an array of structures:

struct S{
   int a;
   float b;
}

How would I do this in C#?

1
  • 1
    remove MemoryStream and create BinaryReader directly from FileStream edit: like you do already .. What is the usage of MemoryStream in your code then? ... With this code you are reading file twice! Commented Jan 20, 2020 at 14:39

4 Answers 4

4

define a struct (possible a readonly struct) with explicit layout ([StructLayout(LayoutKind.Explicit)]) that is precisely the same as your C++ code, then one of:

  1. open the file as a memory-mapped file, get the pointer to the data; use either unsafe code on the raw pointer, or use Unsafe.AsRef<YourStruct> on the data, and Unsafe.Add<> to iterate
  2. open the file as a memory-mapped file, get the pointer to the data; create a custom memory over the pointer (of your T), and iterate over the span
  3. open the file as a byte[]; create a Span<byte> over the byte[], then use MemoryMarshal.Cast<,> to create a Span<YourType>, and iterate over that
  4. open the file as a byte[]; use fixed to pin the byte* and get a pointer; use unsafe code to walk the pointer
  5. something involve "pipelines" - a Pipe that is the buffer, maybe using StreamConnection on a FileStream for filling the pipe, and a worker loop that dequeues from the pipe; complication: the buffers can be discontiguous and may split at inconvenient places; solvable, but subtle code required whenever the first span isn't at least 8 bytes

(or some combination of those concepts)

Any of those should work much like your C++ version. The 4th is simple, but for very large data you probably want to prefer memory-mapped files

Sign up to request clarification or add additional context in comments.

Comments

2

Thank you everyone for very helpful comments and answers. Given this input, this is my preferred solution:

      [StructLayout(LayoutKind.Sequential, Pack = 1)]
      struct Data
      {
         public UInt32 dummy;
         public Single val;
      };
      static void Main(string[] args)
      {
         byte [] byteArray = File.ReadAllBytes("File.bin");
         ReadOnlySpan<Data> dataArray = MemoryMarshal.Cast<byte, Data>(new ReadOnlySpan<byte>(byteArray));
         Single prevVal=0;
         foreach( var v in dataArray) {
            if (prevVal!=0) {
               var diff = Math.Abs(v.val - prevVal) / prevVal;
               if (diff > 0.25)
                  Console.WriteLine("Bad!");
            }
            prevVal = v.val;
         }
      }
   }

It indeed works much faster than the original implementation.

Comments

1

This is what we use (compatible with older versions of C#):

public static T[] FastRead<T>(FileStream fs, int count) where T: struct
{
    int sizeOfT = Marshal.SizeOf(typeof(T));

    long bytesRemaining  = fs.Length - fs.Position;
    long wantedBytes     = count * sizeOfT;
    long bytesAvailable  = Math.Min(bytesRemaining, wantedBytes);
    long availableValues = bytesAvailable / sizeOfT;
    long bytesToRead     = (availableValues * sizeOfT);

    if ((bytesRemaining < wantedBytes) && ((bytesRemaining - bytesToRead) > 0))
    {
        Debug.WriteLine("Requested data exceeds available data and partial data remains in the file.");
    }

    T[] result = new T[availableValues];

    GCHandle gcHandle = GCHandle.Alloc(result, GCHandleType.Pinned);

    try
    {
        uint bytesRead;

        if (!ReadFile(fs.SafeFileHandle, gcHandle.AddrOfPinnedObject(), (uint)bytesToRead, out bytesRead, IntPtr.Zero))
        {
            throw new IOException("Unable to read file.", new Win32Exception(Marshal.GetLastWin32Error()));
        }

        Debug.Assert(bytesRead == bytesToRead);
    }

    finally
    {
        gcHandle.Free();
    }

    GC.KeepAlive(fs);

    return result;
}

[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1415:DeclarePInvokesCorrectly")]
[DllImport("kernel32.dll", SetLastError=true)]
[return: MarshalAs(UnmanagedType.Bool)]

private static extern bool ReadFile
(
    SafeFileHandle       hFile,
    IntPtr               lpBuffer,
    uint                 nNumberOfBytesToRead,
    out uint             lpNumberOfBytesRead,
    IntPtr               lpOverlapped
);

NOTE: This only works for structs that contain only blittable types, of course. And you must use [StructLayout(LayoutKind.Explicit)] and declare the packing to ensure that the struct layout is identical to the binary format of the data in the file.

For recent versions of C#, you can use Span as mentioned by Marc in the other answer!

4 Comments

this'll work, but it feels like "the hard way" compared to just MemoryMappedFile and some unsafe goo (even before the joys and elegance of Span<>); also, unrelated but with recent C#: where T : unmanaged and sizeof(T) - or with older C#: Unsafe.Sizeof<T>() (unless you're actually using Marshal copy rules, as opposed to raw type-whacking)
@MarcGravell I would imagine that avoiding unsafe would be desirable, otherwise it will spread to everything that calls the unsafe method! Nothing in our entire codebase uses unsafe for that reason.
yes, I can't argue with that; that's why I love span so much :)
@MarcGravell Indeed, Span is great if you can use it! (Incidentally I just checked the history for that code, and it seems I wrote it back in 2007... ;)
0

You are actually not using the MemoryStream at all currently. Your BinaryReader accesses the file directly. To have the BinaryReader use the MemoryStream instead:

Replace

f.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(f);

...

while (f.Position < f.Length);

with

memStream.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(memStream);

...

while(r.BaseStream.Position < r.BaseStream.Length)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.