Basically what I would like to do is run multiple (15-25) regex replaces on a single string with the best possible memory management.
Overview:
Streams a text only file (sometimes html) via ftp appending to a StringBuilder to get a very large string. The file size ranges from 300KB to 30MB.
The regular expressions are semi-complex, but require multiple lines of the file (identifying sections of a book for example), so arbitrarily breaking the string, or running the replace on every download loop is out of the answer.
A sample replace:
Regex re = new Regex("<A.*?>Table of Contents</A>", RegexOptions.IgnoreCase);
source = re.Replace(source, "");
With each run of a replace the memory sky rockets, I know this is because string are immutable in C# and it needs to make a copy - even if I call GC.Collect() it still doesn't help enough for a 30MB file.
Any advice on a better way to approach, or a way to perform multiple regex replaces using constant memory (make 2 copies (so 60MB in memory), perform search, discard copy back to 30MB)?
Update:
There does not appear to be a simple answer but for future people looking at this I ended up using a combination of all the answers below to get it to an acceptable state:
If possible split the string into chunks, see manojlds's answer for a way to that as the file is being read - looking for suitable end points.
If you can't split as it streams, at least split it later if possible - see ChrisWue's answer for some external tools that may help with this process to piping to files.
Optimize the regex, avoid greedy operators and try to limit what the engine has to do as much as possible - see Sylverdrag's answer.
Combine the regex when possible, this cuts down the number of replaces for when the regexs are not based on each other (useful in this case for cleaning bad input) - see Brian Reichle's answer for a code sample.
Thank you all!