2

Hello I am trying to rewrite file by replacing bytes but it takes too much time to rewrite large files. For example on 700MB this code was working about 6 minutes. Pls help me to make it work less than 1 minute.

static private void _12_56(string fileName)
{
    byte[] byteArray = File.ReadAllBytes(fileName);
    for (int i = 0; i < byteArray.Count() - 6; i += 6)
    {
        Swap(ref byteArray[i], ref byteArray[i + 4]);
        Swap(ref byteArray[i + 1], ref byteArray[i + 5]);
    }
    File.WriteAllBytes(fileName, byteArray);
}
4
  • 3
    It's probably slow because you're reading the whole file into memory. I don't know what Swap does, is it necessary to hold the whole file, or can you just read chunks of 1MB and work on that at a time? It would also be a good idea to use the Visual Studio profiler to see exactly what is slow about it. Commented May 24, 2018 at 18:25
  • You can check this question/answer,is good reply! stackoverflow.com/questions/955911/… Commented May 24, 2018 at 18:28
  • @JimW Swap just swap bytes using temp variable. For me it's important to replace 1st byte with 4th and 2nd with 5th for each 6 bytes. Commented May 24, 2018 at 18:29
  • You can read and write by byte but not sure it would be faster. Commented May 24, 2018 at 18:31

2 Answers 2

5

Read the file in chuncks of bytes which are divisible by 6. Replace the necessary bytes in each chunk and write each chunk to another file before reading the next chunk.

You can also try to perform the read of the next chunk in parallel with writing the next chunk:

using( var source = new FileStream(@"c:\temp\test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using( var target = new FileStream(@"c:\temp\test.txt", FileMode.Open, FileAccess.Write, FileShare.ReadWrite))
    {
        await RewriteFile(source, target);
    }
}


private async Task RewriteFile( FileStream source, FileStream target )
{
    // We're reading bufferSize bytes from the source-stream inside one half of the buffer
    // while the writeTask is writing the other half of the buffer to the target-stream.

    // define how many chunks of 6 bytes you want to read per read operation
    int chunksPerBuffer = 1;
    int bufferSize = 6 * chunksPerBuffer;

    // declare a byte array that contains both the bytes that are read
    // and the bytes that are being written in parallel.
    byte[] buffer = new byte[bufferSize * 2];
    // curoff is the start-position of the bytes we're working with in the 
    // buffer
    int curoff = 0;

    Task writeTask = Task.CompletedTask;
    int len;

    // Read the desired number of bytes from the file into the buffer.
    // In the first read operation, the bytes will be placed in the first
    // half of the buffer.  The next read operation will read them in 
    // the second half of the buffer.      
    while ((len = await source.ReadAsync(buffer, curoff, bufferSize).ConfigureAwait(false)) != 0)
    {
        // Swap the bytes in the current buffer.
        // When reading x * 6 bytes in one go, every 1st byte will be replaced by the 4th byte; every 2nd byte will be replaced by the 5th byte.
        for (int i = curoff; i < bufferSize + curoff; i += 6)
        {
            Swap(ref buffer[i], ref buffer[i + 4]);
            Swap(ref buffer[i + 1], ref buffer[i + 5]);
        }

        // wait until the previous write-task completed.
        await writeTask.ConfigureAwait(false);
        // Start writing the bytes that have just been processed.
        // Do not await the task here, so that the next bytes 
        // can be read in parallel.
        writeTask = target.WriteAsync(buffer, curoff, len);

        // Position the pointer to the beginnen of the other part
        // in the buffer
        curoff ^= bufferSize;                        

    }

    // Make sure that the last write also finishes before closing
    // the target stream.
    await writeTask.ConfigureAwait(false);
}

The code above should read a file, swap bytes and rewrite to the same file in parallel.

Sign up to request clarification or add additional context in comments.

15 Comments

As an addendum, I would consider not worrying about dividing into 6 specifically. Pick a suitable static chunk size and asynchronously read as many chunks as necessary. That way if your file size or performance requirements change, you aren't left 'locked in'.
Pretty sure referencing the task and running it multiple times like this will not work. You need a new call to readAsync every iteration.
No, it doesn't. It always reads the same bufferSize bytes from the beginning of the file, and never terminates, if the file is larger than the buffer.
"to another file" might not be acceptable, for security/capacity/transactional reasons.
@FrederikGheysels I'm sorry - I read your code three times, and missed that you reassign readTask. Why not just call ReadAsync in the loop? Really threw me...
|
4

As the other answer says, you have to read the file in chunks.

Since you are rewriting the same file, it's easiest to use the same stream for reading and writing.

using(var file = File.Open(path, FileMode.Open, FileAccess.ReadWrite)) {        
    // Read buffer. Size must be divisible by 6
    var buffer = new byte[6*1000]; 

    // Keep track of how much we've read in each iteration
    var bytesRead = 0;      

    // Fill the buffer. Put the number of bytes into 'bytesRead'.
    // Stop looping if we read less than 6 bytes.
    // EOF will be signalled by Read returning -1.
    while ((bytesRead = file.Read(buffer, 0, buffer.Length)) >= 6)
    {   
        // Swap the bytes in the current buffer
        for (int i = 0; i < bytesRead; i += 6)
        {
            Swap(ref buffer[i], ref buffer[i + 4]);
            Swap(ref buffer[i + 1], ref buffer[i + 5]);
        }

        // Step back in the file, to where we filled the buffer from
        file.Position -= bytesRead;
        // Overwrite with the swapped bytes
        file.Write(buffer, 0, bytesRead);
    }
}

7 Comments

I like this answer better but you could also open 2 FileStreams (1 r, 1 w) to the same file. Your approach might be wasting some of the lower level buffering.
@gnud just wondering, why is it more efficient to chunk the file?
Thanks, it's working about 30 seconds on 700 MB file.
@johnny5 The way I think about this, comes from the day of spinning disks. With spinning disks, you might issue way too many seeks (move read/write head over disk) if you do many small read/write-operations. That physical reason is not the same with SSDs. Still, there will be a system call for every read/write if they're not buffered. I'm sure there's "invisible" buffering happening at the OS level and at the disk level, and it's possible there won't be a major difference. Hard to test though - exactly because of those caches.
@HenkHolterman Would be interesting to test. Would also be simple to do. Just add another stream, read from one, write to the other, don't change the Position. Again, it's really hard to test this stuff because of disk caches. It's easy to test with warm cache - not with cold.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.