4

I have a dictionary of slangs with their meanings and I want to replace all the slangs in my text.

I have found partially working solution https://stackoverflow.com/a/2400577

For now my code looks like this:

import re

myText = 'brb some sample text I lov u. I need some $$ for 2mw.'

dictionary = {
  'brb': 'be right back',
  'lov u': 'love you',
  '$$': 'money',
  '2mw': 'tomorrow'
}

pattern = re.compile(r'\b(' + '|'.join(re.escape(key) for key in dictionary.keys()) + r')\b')
result = pattern.sub(lambda x: dictionary[x.group()], myText)

print(result)

Output:

be right back some sample text I love you. I need some $$ for tomorrow.

As you can see sings $$ haven't changed and I know it is due to \b syntax. How can I change my regex to achieve my goal?

1 Answer 1

2

Replace the word boundaries with lookarounds that check for any word chars around the search phrase

pattern = re.compile(r'(?<!\w)(' + '|'.join(re.escape(key) for key in dictionary.keys()) + r')(?!\w)')

See the Python demo

The (?<!\w) negative lookbehind fails the match if there is a word char immediately to the left of the current location and the (?!\w) negative lookahead fails the match if there is a word char immediately to the right of the current location.

Replace (?<!\w) with (?<!\S) and (?!\w) with (?!\S) if you need to only match search phrases in between whitespace chars and start/end of string.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.