1

I have a python dictionary of key, value pairs and I want to replace some words in a string which are the keys in the dictionary with their corresponding values.

I have tried some code which are found online.Here is the example:

    test_dict = {'a/a': 'result1', "a/a b/b c/c": "result2"}

    sentence = "<<a/a>> something <<a/a b/b c/c>> something"

    result = multiple_replace(test_dict, sentence)

    def multiple_replace(dict, text):
        regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
        return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

I expected the result to be <<result1>> something <<result2>> something

The actual output is <<result1>> something <<result1 b/b c/c>> something

3 Answers 3

1

Your code replaced all of the a/a's it found in the string, meaning that there was no longer a a/a b/b c/c to be replaced.

If you surrounded each key w/ << and >>, searched for that, and replaced it putting back the << & >>, you would avoid this problem.

Sign up to request clarification or add additional context in comments.

Comments

1

The problem is that <<a/a b/b c/c>> will be replaced by 'a/a': 'result1' resulting in "<<result1 b/b c/c>>" before the a/a b/b c/c replacement would happen.

You should do the replace starting with the more specific to the less specific. One way to accomplish this is to use OrderedDict and define your rules in the other direction:

import re
from collections import OrderedDict

test_dict = OrderedDict([("a/a b/b c/c", "result2"), ('a/a', 'result1'),])

sentence = "<<a/a>> something <<a/a b/b c/c>> something"

def multiple_replace(dict, text):
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

result = multiple_replace(test_dict, sentence)

The output is: <<result1>> something <<result2>> something

Comments

0

Your problem is that your first relpaced key 'a/a' is part of another key 'a/a b/b c/c'. The longer key does not replace anymore because the rule for 'a/a' changes the text so it wont find 'a/a b/b c/c' anymore.

You can avoid this if you sort the keys by decreasing length, so longer ones are replaced first:

import re

def multiple_replace(d, text):
    # sort keys by -len so longer ones come first (you could use reverse=True as well)
    regex = re.compile("(%s)" % "|".join(map(re.escape, 
                                             sorted(d.keys(),key=lambda x:-len(x)))))
    return regex.sub(lambda mo: d[mo.string[mo.start():mo.end()]], text)

test_dict = {'a/a': 'result1', "a/a b/b c/c": "result2"}

sentence = "<<a/a>> something <<a/a b/b c/c>> something"

result = multiple_replace(test_dict, sentence)
print(result)

Output:

<<result1>> something <<result2>> something

You will still have problems if the value that was replaced contains part of a shorter key, it will be partially replaced again.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.