When should I use Regex over string operations and vice versa only regarding performance?
4 Answers
It depends
Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:
- How many times you parse the regex
- How cleverly you write your string code
- Whether the regex is precompiled
As the regex gets more complicated, it will take much more effort and complexity to write equivalent string manipulation code that performs well.
1 Comment
(.*)(,) sees a comma, should that be considered part of the first group or second group? It could be either; if one parse fails later on, the RegEx engine has to backtrack and try again with the other parse. On the other hand ([^,]*)(,) would require the comma to be part of the second group without looking ahead. See this answer for example: stackoverflow.com/a/2670853/1078199String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way.
Regular expressions have to be parsed, and code generated to perform the operation using string operations. At best, the regular expression operation can do what's optimal to do the string manipulations.
Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.
Comments
I've done some benchmarks with two functions called FunctionOne (string operations) and FunctionTwo (Regex). They should both get all matches between '<' and '>'.
benchmark #1:
- times called: 1'000'000
- input: 80 characters
- duration (string operations // FunctionOne): 1.12 sec
- duration (regex operation //FunctionTwo) : 1.88 sec
benchmark #2:
- times called: 1'000'000
- input: 2000 characters
- duration (string operations): 27.69 sec
- duration (regex operations): 41.436 sec
Conclusion: String operations will almost always beat regular expressions, if programmed efficiently. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.
Code FunctionOne
private void FunctionOne(string input) {
var matches = new List<string>();
var match = new StringBuilder();
Boolean startRecording = false;
foreach( char c in input) {
if (c.Equals('<')) {
startRecording = true;
continue;
}
if (c.Equals('>')) {
matches.Add(match.ToString());
match = new StringBuilder();
startRecording = false;
}
if (startRecording) {
match.Append(c);
}
}
}
Code FunctionTwo
Regex regx = new Regex("<.*?>");
private void FunctionTwo(string input) {
Match m = regx.Match(input);
var results = new List<string>();
while (m.Success) {
results.Add(m.Value);
m = m.NextMatch();
}
}
13 Comments
RegexOptions.Compiled, it will become even faster.'<'.[^>]* would probably be faster.I did some profiling in C# a while back, comparing the following:
1)LINQ to Objects.
2)Lambda Expressions.
3)Traditional iterative method.
All 3 methods were tested both with and without Regular Expressions. The conclusion in my test case was clear that Regular Expressions are quite a bit slower than non-Regex in all 3 cases when searching for strings in a large amount of text.
You can read the details on my blog: http://www.midniteblog.com/?p=72