2

I want to search if a sentence has particular pattern or not. Do nothing if not found. If pattern found, substitute pattern with another substring in the string.

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?" 
#Desired Result: Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ? 

#This is what I have tried..    
def findSubString(raw_string, start_marker, end_marker): 

    start = raw_string.index(start_marker) + len(start_marker)
    end = raw_string.index(end_marker, start)
    return raw_string[start:end]

phrase = findSubString(line1, "``", "''")
newPhrase = phrase.strip(' ').replace(' ', '_')
line1 = line1.replace(phrase, newPhrase)

Current Result: Who acted as ``Bruce_Wayne'' in the movie `` Batman Forever '' ?

So far, I managed to find the first occurrence in the sentence but not the next. How to search for all occurrences with matching pattern?

2
  • Can you have newline characters in your string, and between your markers? This has an impact on some possible solutions, because a newline is a natural "end" for a string and can be treated in a special way (for instance by the re module). Commented Jun 19, 2013 at 2:51
  • @EOL - No the characters between the markers are within the same sentence. No newline characters. Commented Jun 19, 2013 at 4:09

1 Answer 1

4

Using regular expression:

import re

def findSubString(raw_string, start_marker, end_marker):
    return re.sub(
        r'(?<={}).*?(?={})'.format(re.escape(start_marker), re.escape(end_marker)),
        lambda m: m.group().strip().replace(' ', '_'),
        raw_string)

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?"
line1 = findSubString(line1, "``", "''")
assert line1 == "Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ?"

Without regular expression:

def findSubString(raw_string, start_marker, end_marker): 
    result = []
    rest = raw_string
    while True:
        head, sep, tail = rest.partition(start_marker)
        if not sep:
            break
        body, sep, tail = tail.partition(end_marker)
        if not sep:
            break
        result.append(head + start_marker + body.strip().replace(' ', '_') + end_marker)
        rest = tail
    result.append(rest)
    return ''.join(result)
Sign up to request clarification or add additional context in comments.

1 Comment

+1 for the regular expression part: very good use of the lookbehind, lookahead and non-greedy matchings, and of escaping!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.