3

We have a requirement to transform a string containing a date in dd/mm/yyyy format to ddmmyyyy format (In case you want to know why I am storing dates in a string, my software processes bulk transactions files, which is a line based textual file format used by a bank).

And I am currently doing this:

string oldFormat = "01/01/2014";
string newFormat = oldFormat.Replace("/", "");

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?


Here's the reason why I am asking this:

I am processing transaction files ranging in size from few kilobytes to hundreds of megabytes. So far I have not had a performance/memory problem, because I am still testing with very small files. But when it comes to megabytes I am not sure if I will have problems with these additional strings. I suspect that would be the case because strings are immutable. With millions of records this additional memory consumption will build up considerably.

I am already using StringBuilders for output file creation. And I also know that the discarded strings will be garbage collected (at some point before the end of the time). I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

3
  • you should try using Regex.Replace, and compare performance. I once had to remove unnecessary NewLine characters from a file of size ~1MB, and regex made a lot of difference (measured in minutes...) Although I had to do conditional replace and some other text operations, so I recomend testing it in this exact case Commented Oct 10, 2014 at 12:06
  • 2
    I think it allocates only one string for one entire Replace. Not one string for each replace of an occurrence. Commented Oct 10, 2014 at 12:08
  • String ReplaceInternal is method implemented externally. I don't think we can know what is going on under the hood. Commented Oct 10, 2014 at 12:12

4 Answers 4

7

Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?

No, it doesn't create intermediate strings for each replacement. But it does create new string, because, as you already know, strings are immutable.

Why?

There is no reason to a create new string on each replacement - it's very simple to avoid it, and it will give huge performance boost.

If you are very interested, referencesource.microsoft.com and SSCLI2.0 source code will demonstrate this(how-to-see-code-of-method-which-marked-as-methodimploptions-internalcall):

FCIMPL3(Object*, COMString::ReplaceString, StringObject* thisRefUNSAFE, 
          StringObject* oldValueUNSAFE, StringObject* newValueUNSAFE)
{

   // unnecessary code ommited
      while (((index=COMStringBuffer::LocalIndexOfString(thisBuffer,oldBuffer,
             thisLength,oldLength,index))>-1) && (index<=endIndex-oldLength))
    {
        replaceIndex[replaceCount++] = index;
        index+=oldLength;
    }

    if (replaceCount != 0)
    {
        //Calculate the new length of the string and ensure that we have 
        // sufficent room.
        INT64 retValBuffLength = thisLength - 
            ((oldLength - newLength) * (INT64)replaceCount);

        gc.retValString = COMString::NewString((INT32)retValBuffLength);
     // unnecessary code ommited
    }
}

as you can see, retValBuffLength is calculated, which knows the amount of replaceCount's. The real implementation can be a bit different for .NET 4.0(SSCLI 4.0 is not released), but I assure you it's not doing anything silly :-).

I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.

Yes. Reusable StringBuilder that has capacity of ~2000 characters. Avoid any memory allocation. This is only true if the the replacement lengths are equal, and can get you a nice performance gain if you're in tight loop.

Before writing anything, run benchmarks with big files, and see if the performance is enough for you. If performance is enough - don't do anything.

Sign up to request clarification or add additional context in comments.

3 Comments

@Alovchin, yes, discovered it myself too few hours ago. It's only 2.0, but definitely gives you nice idea what's going on :-)
@ChrisEelmaa How did you find String.ReplaceInternal method calls this code on CLI 2.0?
@SonerGönül: edited my post & added clarifications. As of right now, the only chance to see String.ReplaceInternal would be to disassemble your mscorlib.dll. SSCLI2.0 is good enough though to argue about this though. grepWin is your friend ;)
6

Well, I'm not a .NET development team member (unfortunately), but I'll try to answer your question.

Microsoft has a great site of .NET Reference Source code, and according to it, String.Replace calls an external method that does the job. I wouldn't argue about how it is implemented, but there's a small comment to this method that may answer your question:

// This method contains the same functionality as StringBuilder Replace. The only difference is that
// a new String has to be allocated since Strings are immutable

Now, if we'll follow to StringBuilder.Replace implementation, we'll see what it actually does inside.

A little more on a string objects:

Although String is immutable in .NET, this is not some kind of limitation, it's a contract. String is actually a reference type, and what it includes is the length of the actual string + the buffer of characters. You can actually get an unsafe pointer to this buffer and change it "on the fly", but I wouldn't recommend doing this.

Now, the StringBuilder class also holds a character array, and when you pass the string to its constructor it actually copies the string's buffer to his own (see Reference Source). What it doesn't have, though, is the contract of immutability, so when you modify a string using StringBuilder you are actually working with the char array. Note that when you call ToString() on a StringBuilder, it creates a new "immutable" string any copies his buffer there.

So, if you need a fast and memory efficient way to make changes in a string, StringBuilder is definitely your choice. Especially regarding that Microsoft explicitly recommends to use StringBuilder if you "perform repeated modifications to a string".

4 Comments

The contract for String.Replace does not require that the implementation avoid the creation of unnecessary intermediate String objects, but it is unlikely that such an implementation would be used when it is so easily avoided.
So I have almost the same answer as you and I answer before you... you get an up vote and I get a down vote..... what gives??
@kjbartel: in what way is you answer even similar to this? You say that it always creates a new string. But OP asked if it creates a new string for every occurrence of the string that should be replaced, not once per Replace-call. This tries to find a source where it is documented how String.Replace is actually implemented. The comment suggests that only one string is created.
@SamHarwell I wouldn't argue about the actual implementation because it well might be implemented in native code, but it definitely doesn't create new intermediate strings. Actually Microsoft itself recommends to use StringBuilder if you "perform repeated modifications to a string".
0

I haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string.

So you can use StringBuilder.Replace since you're already using a StringBuilder.

2 Comments

Thanks, Well it turns out my question is a XY problem, and you have given a nice tip to solve X (efficient replacing). But I also would like to know the answer for Y too (if replacing multiple occurrences creates multiple strings).
@Krumia: i haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string (i'll add this comment to my answer).
0

There is no string method for that. You are own your own. But you can try something like this:

oldFormat="dd/mm/yyyy";

string[] dt = oldFormat.Split('/');
string newFormat = string.Format("{0}{1}/{2}", dt[0], dt[1], dt[2]);

or

StringBuilder sb = new StringBuilder(dt[0]);
sb.AppendFormat("{0}/{1}", dt[1], dt[2]);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.