4

I get a text file from a mainframe and sometimes there are some 0x0D injected into the middle of the text lines.

The previos programmer created a method using the FileStream class. This method works fine but is taking around 30 minutes to go thru the entire file.

My thought was to pass the text lines that are needed (about 25 lines) to a method to decrease the processing time.

I've been working with the MemoryStream class but am having issue where it does not find the 0x0D control code.

Here is the current FileStream method:

private void ReplaceFileStream(string strInputFile)
{
    FileStream fileStream = new FileStream(strInputFile, FileMode.Open, FileAccess.ReadWrite);
    byte filebyte;

    while (fileStream.Position < fileStream.Length)
    {
        filebyte = (byte)fileStream.ReadByte();
        if (filebyte == 0x0D)
        {
            filebyte = 0x20;
            fileStream.Position = fileStream.Position - 1;
            fileStream.WriteByte(filebyte);
        }
    }
    fileStream.Close();
}

and here is the MemoryStream method:

private void ReplaceMemoryStream(string strInputLine)
{
    byte[] byteArray = Encoding.ASCII.GetBytes(strInputLine);
    MemoryStream fileStream = new MemoryStream(byteArray);

    byte filebyte;

    while (fileStream.Position < fileStream.Length)
    {
        filebyte = (byte)fileStream.ReadByte();
        if (filebyte == 0x0D)
        {
            filebyte = 0x20;
            fileStream.Position = fileStream.Position - 1;
            fileStream.WriteByte(filebyte);
        }
    }
    fileStream.Close();
}

As I have not used the MemoryStream class before am not that familar with it. Any tips or ideas?

3
  • 3
    Doesn't find it, or doesn't write your changes? You're basically modifying a byte array in memory. Your code does not write the changes back to disk. Indeed, the MemoryStream is completely superfluous as your code stands - you may as well have just iterated over the byte array and modified that. Then use File.WriteAllBytes to save it back to disk. Commented Sep 9, 2011 at 15:36
  • 1
    I'm curious why you don't just do something like strInputLine.Replace('\x000D', '') then write the line. Mebe I am missing something? Or whatever the corresponding character is in place of the hex...you can escape and replace control characters. Commented Sep 9, 2011 at 15:48
  • Kent - never finds it Justin & Austin - good team work Rig - thought I tried everything but did not remember trying that. Was using "\r" thinking it would be the same but never found it Commented Sep 9, 2011 at 20:00

2 Answers 2

3

I don't know the size of your files, but if they are small enough that you can load the whole thing in memory at once, then you could do something like this:

private void ReplaceFileStream(string strInputFile)
{
    byte[] fileBytes = File.ReadAllBytes(strInputFile);
    bool modified = false;
    for(int i=0; i < fileBytes.Length; ++i)
    {
        if (fileByte[i] == 0x0D)
        {
            fileBytes[i] = 0x20;
            modified = true;
        }
    } 

    if (modified)
    {
        File.WriteAllBytes(strInputFile, fileBytes);
    }
}

If you can't read the whole file in at once, then you should switch to a buffered reading type of setup, here is an example that reads from the file, writes to a temp file, then in the end copies the temp file over the original file. This should yield better performance then reading a file one byte at a time:

private void ReplaceFileStream(string strInputFile)
{
    string tempFile = Path.GetTempFileName();
    try
    {
        using(FileStream input = new FileStream(strInputFile,
            FileMode.Open, FileAccess.Read))
        using(FileStream output = new FileStream(tempFile,
            FileMode.Create, FileAccess.Write))
       {
           byte[] buffer = new byte[4096];
           bytesRead = input.Read(buffer, 0, 4096);
           while(bytesRead > 0)
           {
                for(int i=0; i < bytesRead; ++i)
                {
                    if (buffer[i] == 0x0D)
                    {
                        buffer[i] = 0x20;
                    }
                }

                output.Write(buffer, 0, bytesRead);
                bytesRead = input.Read(buffer, 0, 4096);
            }
            output.Flush();
        }

        File.Copy(tempFile, strInputFile);
    }
    finally
    {
        if (File.Exists(tempFile))
        {
            File.Delete(tempFile);
        }
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

I was able to use the first code snippet loading the file into memory. This has decreased the processing time from 47 minutes to 5 seconds. I actually ran the test several times as I could not believe there was that huge of a time difference.
There is a massive difference when you read from a file in a buffered fashion rather than 1 byte at a time. With the first snippet you are reading the whole file as 1 chunk, so it really should be much faster. Just keep in mind that if the files are very large you would want to use the second snippet as you could run into issues loading the whole file into memory.
the files are always between 4-5 meg and have been for several years. At what size should I be concerned about using different chunks.
@HaySeed - I can't give you a definitive answer as I don't know if you are compiling this as a 32 or 64 bit app and what other things are running in your app to affect the memory usage at the point this code runs. If the files are 4 - 5 MB, then when you run the ReadAllBytes line you will have a byte array that is 4 - 5 MB in memory. If you have a file that is 1 GB, then you have to have at least 1 GB of available memory to load the whole thing in memory.
@HaySeed a quick look on SO turned up this article since the array class is using an int for its length counter, you can't have an array larger than Int32.MaxSize, so you can't read a file larger than that in one chunk.
2

if your replacement code does not find the 0x0D in the stream and the previous method with the FileStream does it, I think it could be because of the Encoding you are using to get the bytes of the file, you can try with some other encoding types.

otherwise your code seems to be fine, I would use a using around the MemoryStream to be sure it gets closed and disposed, something like this:

using(var fileStream = new MemoryStream(byteArray))
{

  byte filebyte;

 // your while loop...

}

looking at your code I am not 100% sure the changes you make to the memory stream will be persisted; Actually I think that if you do not save it after the changes, your changes will be lost. I can be wrong in this but you should test and see, if it does not save you should use StreamWriter to save it after the changes.

2 Comments

I am 100% sure they will not be persisted anywhere.
me too but never say never unless checked and I had no time to debug such snippet myself now ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.