0

I am trying to replace a selected text with a single word from that selected text using regex. I tried re.sub() but it seems that it takes the second argument "The word that I want to replace it with the text" as a string, not as regex.

Here is my string:

I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> .

And here is my code:

# The regex of the form <ERR targ=...> .. </ERR>
select_text_regex = r"<ERR[^<]+<\/ERR>"

# The regex of the correct word that will replace the selected text of teh form <ERR targ=...> .. </ERR>
correct_word_regex = r"targ=([^>]+)>"
line = re.sub(select_text_regex, correct_word_regex, line.rstrip())

I get:

I go to Bridgebrook i go out targ=([^>]+)> on Tuesday night i go to
Youth targ=([^>]+)> .

My goal is:

I go to Bridgebrook i go out sometimes on Tuesday night i go to
Youth club .

Does Python support replacing two strings using Regex?

3 Answers 3

1

Here's another solution (I also rewrote the regex using "non-greedy" modifiers by putting ? after * because I find it more readable).

The group referenced by r"\1" is done with parenthises as an unnamed group. Also used re.compile as a style preference to reduce the number of args:

line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."
select_text_regex = re.compile(r"<ERR targ=(.*?)>.*?<\/ERR>")
select_text_regex.sub(r"\1", line)

Named group alternative:

line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."
select_text_regex = re.compile(r"<ERR targ=(?P<to_replace>.*?)>.*?<\/ERR>")
select_text_regex.sub(r"\g<to_replace>", line)

You can find some docs on group referencing here:

https://docs.python.org/3/library/re.html#regular-expression-syntax

Sign up to request clarification or add additional context in comments.

4 Comments

Keep in mind that this is just a regex solution! A lot of people will recommend that you use a proper parser and a library like beautifulsoup if your use case can be more complicated: crummy.com/software/BeautifulSoup/bs4/doc
Is it possible to convert the replaced text to upper letters with re.sub()?
This solution works - replace "r\1" with a function: stackoverflow.com/questions/8934477/…
I used your second implementation: line = select_text_regex.sub(r"\g<to_replace>\1",lambda m: m.group('first').upper(), line) It says: TypeError: 'str' object cannot be interpreted as an integer
0

You would need to match the target word in the pattern, as a capturing group - you can't start an entirely new search in the replacement string!

Not tested, but this should do the job:

Replace r"<ERR targ=(.*?)>.*?</ERR>"

With r"\1"

Comments

0

What you're looking for is regex capture groups. Instead of selecting the regex and then trying to replace it with another regex, put the part of your regex you want to match inside parenthesis in your select statement, then get it back in the replacement with \1. (the number being the group you included)

line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."

select_text_regex = r"<ERR targ=([^<]+)>[^<]+<\/ERR>" #Correct Here.
correct_word_regex = r"\1" #And here.

line = re.sub(select_text_regex, correct_word_regex, line.rstrip())

print(line)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.