10

I read a cool article on how to avoid creating slow regular expressions. Generally speaking it looks like the longer and more explicit and regex is the faster it will complete. A greedy regex can be exponentially slower.

I thought I would test this out by measuring the time it takes to complete a more complex/explicit statement with a less complex/greedy statement. For the most part everything seems to hold true, but I have one greedy statement that clocked in slower. Here are two examples:

import re
from timeit import timeit

# This works as expected, the explicit is faster than the greedy.
# http_x_real_ip explicit 
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.159849308001867

# http_x_real_ip greedy
print(timeit(setup="import re", stmt='''r = re.search(r'((?:\d{1,3}\.){3}\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
1.7421739230003368

# This does not work as expected, greedy is faster.
# time_local explicit
print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,2}/\w{3}/[2][0]\d{2}:\d{2}:\d{2}:\d{2}\s[+][0]{4})', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.248802040994633

# time_local greedy
print(timeit(setup="import re", stmt='''r = re.search(r'\[(.*)\]', "[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
1.0256699790043058

Is the local_time explict regex just poorly written?

5
  • the time_local greedy is not equivalent. It just matches anything between []. Commented Jul 1, 2015 at 14:49
  • Maybe the time_local greedy performs faster than expected? Commented Jul 1, 2015 at 14:58
  • time_local greedy is performing faster than expected...perhaps if the string being parsed was considerably longer, such as the full log entry, it would take longer because of additional backtracking? I'll have to give it a try. Commented Jul 1, 2015 at 15:02
  • 4
    Your last pattern is faster because the regex engine has few tokens to take in account and because there's only one backtrack step to make the pattern succeed. Commented Jul 1, 2015 at 15:04
  • Note that the first pattern is false since you forget to escape the dots. Commented Jul 1, 2015 at 15:46

2 Answers 2

2

The more a regular expression has to backtrack, the slower it is.

This might not hold for very small input data. However, who would care about the performance on small data? :D


This topic is well covered in this article:

Also there are interesting contributions in this question:

Sign up to request clarification or add additional context in comments.

Comments

2

You also are not using the re.compile feature of Python regexes, meaning your search time also includes time for the re module to compile the regex at each iteration.

>>> print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
0.73820400238
>>> print(timeit(setup="import re; regex = re.compile(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})')", stmt='''r = regex.search('192.168.1.1 999.999.999.999')''', number=1000000))
0.271140813828
>>> print(timeit(setup="import re; regex = re.compile(r'((?:\d{1,3}\.){3}\d{1,3})')", stmt='''r = regex.search('192.168.1.1 999.999.999.999')''', number=1000000))
0.31952214241
>>> print(timeit(setup="import re; regex = re.compile(r'(\d{1,2}/\w{3}/[2][0]\d{2}:\d{2}:\d{2}:\d{2}\s[+][0]{4})')", stmt='''r = regex.search("[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
0.371844053268
>>> 

The difference between the greedy and non-greedy regex here is actually much closer to expected when you pre-compile. The rest of the explanation goes to backtracking.

We can see that your tests speed up almost by a factor of 3 if you pre-compile your regexes for a large number of iterations.

This answer is meant to complement @mescalinum's answer, but for a large number of regexes, you should really be compiling the regexes ahead of time for a fair comparison.

1 Comment

Awesome! Thank you for sharing, re.compile looks incredibly beneficial.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.