3

I am trying to hash a file by reading 1024 bytes from a FileStream in a loop and using TransformBlock function. I need this to understand the mechanics of hashing multiple byte arrays into one hash. This would allow me to hash not only files, but also folders. I used this stackoverflow question: Hashing multiple byte[]'s together into a single hash with C#? and this msdn example: http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.transformblock.aspx

Here is the code I have now:

public static byte[] createFileMD5(string path){
    MD5 md5 = MD5.Create();
    FileStream fs = File.OpenRead(path);
    byte[] buf = new byte[1024];
    byte[] newbuf = new byte[1024];

    int num; int newnum;

    num = fs.Read(buf,0,buf.Length);
    while ((newnum = fs.Read(newbuf, 0, newbuf.Length))>0)
    {
        md5.TransformBlock(buf, 0, buf.Length, buf, 0);
        num = newnum;
        buf = newbuf;
    }

    md5.TransformFinalBlock(buf, 0, num);

    return md5.Hash;
}

Unfortunately the hash which it calculates doesnt correspond to the one which I calculated using fciv.

Just to be sure: hexing algorithm which I use on the returned byte array:

    public static string byteArrayToString(byte[] ba)
    {
        StringBuilder hex = new StringBuilder(ba.Length * 2);
        foreach (byte b in ba)
            hex.AppendFormat("{0:x2}", b);
        return hex.ToString();
    }

1 Answer 1

4

The length you pass to TransformBlock is wrong for the last block (unless the file size is a multiple of the buffer size). You need to pass the actual number of bytes read from the file:

md5.TransformBlock(buf, 0, newnum, buf, 0);

Also, I'm not sure why you use newbuf... the original buffer is used only for the first block, then you use newbuf for all subsequent blocks. There is no reason to use a second buffer here. For reference, here's the code I use to compute the hash of a file:

            using (var stream = File.OpenRead(path))
            {
                var md5 = MD5.Create();
                var buffer = new byte[8192];
                int read;
                while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    md5.TransformBlock(buffer, 0, read, buffer, 0);
                }
                md5.TransformFinalBlock(buffer, 0, 0);

                ...
            }
Sign up to request clarification or add additional context in comments.

3 Comments

I thought that I had to use TransformBlock on every block EXCEPT the last one, and use TransformFinalBlock on the last one. It was unclear in the other stackoverflow question: // For each block: md5.TransformBlock(block, 0, block.Length, block, 0); // For last block: md5.TransformFinalBlock(block, 0, block.Length);
@black, actually I'm not sure the way I do it would work with all hash algorithms... I know it works for MD5 and SHA1, but perhaps other algorithms require that all blocks passed to TransformBlock have the same size.
@black: Generally you do TransformBlock on every block except the last one. Then you call TransformFinalBlock on the last block. However, in a stream of an unknown length, you may not know if the last block has been processed until it is too late. For this reason a properly implemented algorithm is expected to finalise the hash when TransformFinalBlock is called with an empty array as in the answer. In an ideal case, however, TransformFinalBlock is expected to be called with the last block of data from the stream. PS: I tested both ways with MD5 & SHA256 and can confirm it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.