Python find all matching substring patterns and replace substring

Question

I want to search if a sentence has particular pattern or not. Do nothing if not found. If pattern found, substitute pattern with another substring in the string.

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?" 
#Desired Result: Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ? 

#This is what I have tried..    
def findSubString(raw_string, start_marker, end_marker): 

    start = raw_string.index(start_marker) + len(start_marker)
    end = raw_string.index(end_marker, start)
    return raw_string[start:end]

phrase = findSubString(line1, "``", "''")
newPhrase = phrase.strip(' ').replace(' ', '_')
line1 = line1.replace(phrase, newPhrase)

Current Result: Who acted as ``Bruce_Wayne'' in the movie `` Batman Forever '' ?

So far, I managed to find the first occurrence in the sentence but not the next. How to search for all occurrences with matching pattern?

Can you have newline characters in your string, and between your markers? This has an impact on some possible solutions, because a newline is a natural "end" for a string and can be treated in a special way (for instance by the re module). — Eric O. Lebigot
– Eric O. Lebigot, Commented Jun 19, 2013 at 2:51
@EOL - No the characters between the markers are within the same sentence. No newline characters. — Cryssie
– Cryssie, Commented Jun 19, 2013 at 4:09

falsetru · Accepted Answer · 2013-06-19 02:54:05Z

4

Using regular expression:

import re

def findSubString(raw_string, start_marker, end_marker):
    return re.sub(
        r'(?<={}).*?(?={})'.format(re.escape(start_marker), re.escape(end_marker)),
        lambda m: m.group().strip().replace(' ', '_'),
        raw_string)

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?"
line1 = findSubString(line1, "``", "''")
assert line1 == "Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ?"

Without regular expression:

def findSubString(raw_string, start_marker, end_marker): 
    result = []
    rest = raw_string
    while True:
        head, sep, tail = rest.partition(start_marker)
        if not sep:
            break
        body, sep, tail = tail.partition(end_marker)
        if not sep:
            break
        result.append(head + start_marker + body.strip().replace(' ', '_') + end_marker)
        rest = tail
    result.append(rest)
    return ''.join(result)

edited Jun 19, 2013 at 2:54

answered Jun 19, 2013 at 2:43

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eric O. Lebigot Over a year ago

+1 for the regular expression part: very good use of the lookbehind, lookahead and non-greedy matchings, and of escaping!

Collectives™ on Stack Overflow

Python find all matching substring patterns and replace substring

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related