2

i have a database

database=['3456734567','qqqqgtcgagagagctacgagaqqqqgtcgagagagctacgagaqqqqgtcgagagagctacgaga']

and I want to extract the repeated string as '34567' 'qqqqgtcgagagagctacgaga'

therefore I use the code as followings:

def string(s):
    return re.search(r'(.+?)\1+', s).group(1)

print string(data[0]) 
print string(data[1])  

however it only output '34567' and 'q'

please tell me how to edit and then get the result 'qqqqgtcgagagagctacgaga'

4 Answers 4

3

In this specific case, you can use a greedy operator instead of a non-greedy one:

r'(.+)\1+'

From documentation:

The *, +, and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against <H1>title</H1>, it will match the entire string, and not just ''. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only <H1>.

Sign up to request clarification or add additional context in comments.

Comments

2

The below expression should give you the required result:

def string(s):
return re.search(r'(.+)\1+', s).group(1)

>>> print string(database[0])
34567

print string(database[1])
qqqqgtcgagagagctacgaga

Comments

2

Remove the '?' in your group, it makes the + qualifier eager, but you want a greedy one that matches as many repetitions as possible.

In [1]: re.match(r'(.+)\1+', 
         'qqqqgtcgagagagctacgagaqqqqgtcgagagagctacgagaqqqqgtcgagagagctacgaga').groups()
Out[1]: ('qqqqgtcgagagagctacgaga',)

2 Comments

you know {1,} is the same as +?
Indeed, I'll change the answer to highlight it's ommitting ? afer the first + that makes the difference.
-1

Using .group() will only return the parentheses-wrapped part of the expression. You can use .start() and .end() to get the indices of the original string where the match happened:

def string(s):
    match = re.search(r'(.+?)\1+', s)
    return s[match.start() : match.end()] if match is not None else None

3 Comments

When running your example with the OP data I get qqqq and 3456734567
if match is not None is the same as if match
.group(0) gives you the entire match

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.