What would be the python regex for the following string?
"111(A5)A05209(May)2005"
I want to get the values:
111
A5
A05209
May
2005
Thanks!
Simply use re.split. Probably the most intuitive method.
>>> import re
>>> re.split(r'[\(\)]', "111(A5)A05209(May)2005")
['111', 'A5', 'A05209', 'May', '2005']
Use re.findall and str.join:
>>> import re
>>> strs = "111(A5)A05209(May)2005"
>>> print "\n".join(re.findall(r'\w+',strs))
111
A5
A05209
May
2005
or re.sub:
>>> print re.sub(r'[\W]+','\n',strs)
111
A5
A05209
May
2005
Another alternative is str.translate:
>>> from string import punctuation, whitespace, maketrans
>>> intab = punctuation + whitespace
>>> outtab = "\n"*len(intab)
>>> print strs.translate(trantab)
111
A5
A05209
May
2005
In terms of performance str.translate is far better than regex:
>>> strs = "111(A5)A05209(May)2005"*1000
>>> %timeit "\n".join(re.findall(r'\w+',strs))
100 loops, best of 3: 2.19 ms per loop
>>> %timeit re.sub(r'[\W]+','\n',strs)
100 loops, best of 3: 4.43 ms per loop
>>> %timeit strs.translate(trantab)
10000 loops, best of 3: 93.9 us per loop
111\(A5\)A05209\(May\)2005. I think what you're looking for is more likely a way to split the string on a set of delimiters (i.e.re.split())...