Python regex string match and replace at the same time

Question

In Python, is there a way to search, return matched string, and replace matched strings all at the same time? See example below:

a = "[fox] dog turtle [cat]"

Goal:

result1 = "fox" #(first match inside bracket)
result2 = "cat" #(second match inside bracket)
result3 = "dog turtle" #(remaining string after removing matched text inside brackets

What I have:

result1, result2 = re.findall('\[.*?\]', a)
result3 = re.sub('\[.*?\]', '', a)

It seems redundant and clunky to have to run re twice. Is there a more elegant way to achieve this?

Charif DZ · Accepted Answer · 2019-10-14 09:37:42Z

6

I think your code is Elegant enough and readable, but if you want to complicate things, There is not function that return matches and replace them in the same time but you can use the force of re.sub that accepts in repl argument a function that accept the matche as argument and should return a str replacement, it's used for dynamic replacing (example: when the replacing depends on the value of the match it self).

import re

a = '[fox] dog turtle [cat]'
matches = []
# append method of list return None so the return string is always `''`
# so when ever we find a match to replace we add it to matches list and replace it with `''`
# in your result you return the fox without brackets so i'm using a capture group inside the brackets
text = re.sub('\[(.*?)\]', lambda m: matches.append(m.group(1)) or '', a)

print(matches)  # ['fox', 'cat']
print(text)  # dog turtle

answered Oct 14, 2019 at 9:37

Charif DZ

14.8k3 gold badges25 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Giacomo Alzetta Over a year ago

The only thing that you can get from the built-in function is the number of replacements performed by using subn instead of sub. But as you say there is no functionality that keeps track of the replacements performed, but you can easily achieve the same result manually as you have shown.

JJLL Over a year ago

Thanks! This is exactly what I was looking for. The actual text string is actually much more complicated, so I'm already using subn to count the number of pattern instances. My main concern was not so much readability, but more about run time for duplicating search/replace for lots of lines.

sushi Over a year ago

What does the <or ''> part do? Is that part of the lambda function? Or does the lambda function get evaluated first and it becomes something like <None or ''> during execution?

Wang Over a year ago

is this solution stable? I cannot find this behaviour documented anywhere

lagripe · Accepted Answer · 2019-10-14 01:02:34Z

0

You can use this regex :

Regex:

\[(.*?)\].*?(\w[\w\s]+\w).*?\[(.*?)\]

Python Code :

import re

a = '[fox] dog turtle [cat]'
pattern = r'\[(.*?)\].*?(\w[\w\s]+\w).*?\[(.*?)\]'
res = re.search(pattern,a)
r1,r2,r3 = res.groups()

Demo : Here

answered Oct 14, 2019 at 1:02

lagripe

7646 silver badges18 bronze badges

1 Comment

JJLL Over a year ago

Thanks but my actual text strings are much more complicated, and bracketed words are not guaranteed to exist or ordered this way, so rigid patterns like this won't work.

Inkling · Accepted Answer · 2024-08-13 14:17:23Z

0

Another way, taking advantage of the fact that re.split() will include capture groups in the result:

import re

a = '[fox] dog turtle [cat] horse'

temp = re.split(r'\[(.*?)\]\s*', a)  # ['', 'fox', 'dog turtle ', 'cat', 'horse']

# Every even-numbered index is part of the modified text:
text = ''.join(temp[::2])  # 'dog turtle horse'

# Every odd-numbered index is a match:
matches = temp[1::2]  # ['fox', 'cat']

I'm assuming it's less efficient than the currently accepted answer though. There may be a better way to de-interleave the elements that I'm not aware of. Notice the \s* I added to the regex to ensure that there isn't any extra whitespace in the end result.

edited Aug 13, 2024 at 14:17

answered Aug 13, 2024 at 13:48

Inkling

3,8234 gold badges36 silver badges49 bronze badges

Collectives™ on Stack Overflow

Python regex string match and replace at the same time

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related