4

In Python, is there a way to search, return matched string, and replace matched strings all at the same time? See example below:

a = "[fox] dog turtle [cat]"

Goal:

result1 = "fox" #(first match inside bracket)
result2 = "cat" #(second match inside bracket)
result3 = "dog turtle" #(remaining string after removing matched text inside brackets

What I have:

result1, result2 = re.findall('\[.*?\]', a)
result3 = re.sub('\[.*?\]', '', a)

It seems redundant and clunky to have to run re twice. Is there a more elegant way to achieve this?

3 Answers 3

6

I think your code is Elegant enough and readable, but if you want to complicate things, There is not function that return matches and replace them in the same time but you can use the force of re.sub that accepts in repl argument a function that accept the matche as argument and should return a str replacement, it's used for dynamic replacing (example: when the replacing depends on the value of the match it self).

import re

a = '[fox] dog turtle [cat]'
matches = []
# append method of list return None so the return string is always `''`
# so when ever we find a match to replace we add it to matches list and replace it with `''`
# in your result you return the fox without brackets so i'm using a capture group inside the brackets
text = re.sub('\[(.*?)\]', lambda m: matches.append(m.group(1)) or '', a)

print(matches)  # ['fox', 'cat']
print(text)  # dog turtle
Sign up to request clarification or add additional context in comments.

4 Comments

The only thing that you can get from the built-in function is the number of replacements performed by using subn instead of sub. But as you say there is no functionality that keeps track of the replacements performed, but you can easily achieve the same result manually as you have shown.
Thanks! This is exactly what I was looking for. The actual text string is actually much more complicated, so I'm already using subn to count the number of pattern instances. My main concern was not so much readability, but more about run time for duplicating search/replace for lots of lines.
What does the <or ''> part do? Is that part of the lambda function? Or does the lambda function get evaluated first and it becomes something like <None or ''> during execution?
is this solution stable? I cannot find this behaviour documented anywhere
0

You can use this regex :

Regex:

\[(.*?)\].*?(\w[\w\s]+\w).*?\[(.*?)\]

Regex

Python Code :

import re

a = '[fox] dog turtle [cat]'
pattern = r'\[(.*?)\].*?(\w[\w\s]+\w).*?\[(.*?)\]'
res = re.search(pattern,a)
r1,r2,r3 = res.groups()

Demo : Here

1 Comment

Thanks but my actual text strings are much more complicated, and bracketed words are not guaranteed to exist or ordered this way, so rigid patterns like this won't work.
0

Another way, taking advantage of the fact that re.split() will include capture groups in the result:

import re

a = '[fox] dog turtle [cat] horse'

temp = re.split(r'\[(.*?)\]\s*', a)  # ['', 'fox', 'dog turtle ', 'cat', 'horse']

# Every even-numbered index is part of the modified text:
text = ''.join(temp[::2])  # 'dog turtle horse'

# Every odd-numbered index is a match:
matches = temp[1::2]  # ['fox', 'cat']

I'm assuming it's less efficient than the currently accepted answer though. There may be a better way to de-interleave the elements that I'm not aware of. Notice the \s* I added to the regex to ensure that there isn't any extra whitespace in the end result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.