39

When should I use Regex over string operations and vice versa only regarding performance?

1
  • 6
    Simple question with a simple answer. If you want to optimise performance, then choose the option that performs better. Commented May 19, 2013 at 19:43

4 Answers 4

41

It depends

Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:

  • How many times you parse the regex
  • How cleverly you write your string code
  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivalent string manipulation code that performs well.

Sign up to request clarification or add additional context in comments.

1 Comment

Regular expressions are slow when they require backtracking to parse. For example, if the regular expression (.*)(,) sees a comma, should that be considered part of the first group or second group? It could be either; if one parse fails later on, the RegEx engine has to backtrack and try again with the other parse. On the other hand ([^,]*)(,) would require the comma to be part of the second group without looking ahead. See this answer for example: stackoverflow.com/a/2670853/1078199
35

String operations will always be faster than regular expression operations. Unless, of course, you write the string operations in an inefficient way.

Regular expressions have to be parsed, and code generated to perform the operation using string operations. At best, the regular expression operation can do what's optimal to do the string manipulations.

Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.

Comments

17

I've done some benchmarks with two functions called FunctionOne (string operations) and FunctionTwo (Regex). They should both get all matches between '<' and '>'.

benchmark #1:

  • times called: 1'000'000
  • input: 80 characters
  • duration (string operations // FunctionOne): 1.12 sec
  • duration (regex operation //FunctionTwo) : 1.88 sec

benchmark #2:

  • times called: 1'000'000
  • input: 2000 characters
  • duration (string operations): 27.69 sec
  • duration (regex operations): 41.436 sec

Conclusion: String operations will almost always beat regular expressions, if programmed efficiently. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.

Code FunctionOne

private void FunctionOne(string input) {
    var matches = new List<string>();
    var match = new StringBuilder();
    Boolean startRecording = false;
    foreach( char c in input) {
        if (c.Equals('<')) {
            startRecording = true;
            continue;
        }

        if (c.Equals('>')) {
            matches.Add(match.ToString());
            match = new StringBuilder();
            startRecording = false;
        }

        if (startRecording) {
            match.Append(c);
        }
    }
}

Code FunctionTwo

Regex regx = new Regex("<.*?>");
private void FunctionTwo(string input) {
    Match m = regx.Match(input);
    var results = new List<string>();
    while (m.Success) {
        results.Add(m.Value);
        m = m.NextMatch();
    }
}

13 Comments

The actual answer is that it heavily depends what you're doing, how, and how often
Your regex benchmark is very wrong; you're re-compiling the regex every time. If you reuse a single instance, it will become much faster. If you pass RegexOptions.Compiled, it will become even faster.
Ok Thanks SLaks, I will post my new results here.
Also, C# has character literals: '<'.
Also, [^>]* would probably be faster.
|
2

I did some profiling in C# a while back, comparing the following:

1)LINQ to Objects.

2)Lambda Expressions.

3)Traditional iterative method.

All 3 methods were tested both with and without Regular Expressions. The conclusion in my test case was clear that Regular Expressions are quite a bit slower than non-Regex in all 3 cases when searching for strings in a large amount of text.

You can read the details on my blog: http://www.midniteblog.com/?p=72

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.