I have a slow python regular expression, if I simply remove the last line of the regex, the speed increases by two orders of magnitude! Here's my reproducing example:
import re
import timeit
mystr = "14923KMLI MLI2010010206581258 0.109 b M M M 0 09 60+ "
basere = r"""
(?P<wban>[0-9]{5})
(?P<faaid>[0-9A-Z]{4})\s
(?P<id3>[0-9A-Z]{3})
(?P<tstamp>[0-9]{16})\s+
\[?\s*((?P<vis1_coef>\-?\d+\.\d*)|(?P<vis1_coef_miss>M))\s*\]?\s*
\[?(?P<vis1_nd>[0-9A-Za-z\?\$/ ])\]?\s+
((?P<vis2_coef>\d+\.\d*)|(?P<vis2_coef_miss>[M ]))\s+(?P<vis2_nd>[A-Za-z\?\$ ])\s+
...............\s+
\[?\s*((?P<drct>\d+)|(?P<drct_miss>M))\s+
((?P<sknt>\d+)|(?P<sknt_miss>M))\s+
((?P<gust_drct>\d+)\+?|(?P<gust_drct_miss>M))\s*\]?\s+
"""
additional = r"""
\[?((?P<gust_sknt>\d+)R?L?F*\d*\+?|(?P<gust_sknt_miss>M))\s*\]?\s+
"""
P1_RE = re.compile(basere + additional, re.VERBOSE)
P2_RE = re.compile(basere, re.VERBOSE)
for myre in ["P1_RE", "P2_RE"]:
statement = "%s.match('%s')" % (myre, mystr)
res = timeit.timeit(statement, "from __main__ import %s" % (myre,),
number=1000)
print('%s took %.9f per iteration' % (myre, res / 1000.))
# result on my laptop, python 2.6 and 3.3 tested
# P1_RE took 0.001489143 per iteration
# P2_RE took 0.000019991 per iteration
So the only difference between P1_RE and P2_RE is the additional regex. Any ideas as to what I am doing wrong?
/trivial/vs/trivial<withAdditional>/and check for the difference again. As written your code paste is indecipherable and you're not likely to get much help