I tried to use regexes for finding max-length sequences formed from repeated doubled letters, like AABB in the string xAAABBBBy.
As described in the official documentation:
The
'*','+', and'?'quantifiers are all greedy; they match as much text as possible.
When I use the quantifier {n,}, I get a full substring, but + returns only parts:
import re
print(re.findall("((AA|BB){3,})", "xAAABBBBy"))
# [('AABBBB', 'BB')]
print(re.findall("((AA|BB)+)", "xAAABBBBy"))
# [('AA', 'AA'), ('BBBB', 'BB')]
Why is {n,} more greedy than +?
(AA|BB){3,}is not really equivalent of(AA|BB)+.(AA|BB){1,}will behave sameBBafter the firstAAinAAABBand moves to the second A and tries again and then it can match at least 3 times eitherAAorBB