1

What I would like to do is to make specific substitions in a given text. For example, '<' should be changed to '[', '>' to ']', and so forth. It is similar to the solution given here: How can I do multiple substitutions using regex in python?, which is

import re 

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

Now, the problem is that I would also like to replace regex-matched patterns. For example, I want to replace 'fo.+' with 'foo' and 'ba[rz]*' with 'bar'.

Removing the map(re.escape in the code helps, so that the regex actually matches, but I then receive key errors, because, for example, 'barzzzzzz' would be a match, and something I want to replace, but 'barzzzzzz' isn't a key in the dictionary, the literal string 'ba[rz]*' is. How can I modify this function to work?

(On an unrelated note, where do these 'foo' and 'bar' things come from?)

2
  • Replacing literals is easy to keep straight, but adding regex as match patterns definitely adds some potential ambiguity (esp. since you may have a couple of dozen of them to keep track of). Take care if you have regex that may overlap, that you test and replace them in the intended order. Commented Jul 11, 2013 at 3:14
  • See also: stackoverflow.com/questions/66270091/… Commented Nov 6, 2021 at 20:33

2 Answers 2

2
import re

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile(r'(%s)' % "|".join(dict.keys()))
  return regex.sub(lambda mo: dict[
      [ k for k in dict if
      re.search(k, mo.string[mo.start():mo.end()])
      ][0]], text)

d = { r'ba[rz]*' : 'bar', '<' : '[' }
s = 'barzzzzzz <'

print multiple_replace(d, s)

Gives:

bar [
Sign up to request clarification or add additional context in comments.

Comments

2

Just do multiple sub calls.

On an unrelated note, Jargon File to the rescue: Metasyntactic variables, foo.

4 Comments

Multiple sub calls is an option, but not when I've got a few dozen different replacements to make throughout a text. Thanks for the link, though!
Just to put things in perspective - you don't like doing a few dozen sub calls, but you're okay with doing a few thousand or million search calls? @perreal's solution answers your literal question very well, but I still think if you benchmark it, you might find that just doing multiple sub calls is both simpler and faster.
... Huh. I'll try benchmarking it, but I did initially think that this way would be faster.
Well, I've benchmarked it. As it turns out, the multiple_replace function is significantly slower than multiple re.subs, and it seems to scale exponentially with the number of substitutions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.