1

I am using a dictionary that contain regular expressions to substitute portions of different strings, as elegantly described in a previous SO question by @roippi. The first 're.sub' expression works perfectly. However, whenever my code actually involves regex expressions (the second 're.sub' expression), the substitutions don't work.

I am very confused as to why this is the case. I have tried both using and taking out the 'r' as well as incorporating the lookahead/lookbehind expressions, nothing seems to work. Any help would be greatly appreciated!

test_dict = {r'(\d+)': 'THIS IS A NUMBER', 'john_doe':'THIS IS A NAME'}

re.sub('(john_doe)', lambda x: test_dict.get(x.group(1),x.group(1)),'john_doe_jr')

re.sub(r'(\d+)', lambda x: test_dict.get(x.group(1), x.group(1)), '999la')
4
  • The <(\d+)> pattern does not match anything in 999la Commented Jan 3, 2018 at 21:13
  • Got problems to understand what this achieves. Care to explain? Commented Jan 3, 2018 at 21:17
  • Hi @WiktorStribiżew, thank you for your response. I had incorrectly added the html tags, which I have removed. However, this expression should work now (I have tried it in Pythex and it's fine) and it still does not when using the "re.sub" expression. Any thoughts as to why? Commented Jan 3, 2018 at 21:45
  • The regex matches 999, which becomes x.group(1). 999 is not a key in test_dict, which causes the .get method to return 999, replacing 999 with 999. Commented Jan 4, 2018 at 9:23

1 Answer 1

2

match.group(n) does not return the regular expression that was used to match the nth group, but the nth group itself.

The lambda therefore returns test_dict.get('999', '999'), which returns '999', because '999' is not a key in your dictionary.

You could iterate over the keys of the dictionary and check if any key matches your capture group, and then replace it, but that has O(n) time complexity (in the size of the dictionary).

def replacement(match, d, group=1):
    for key in d:
        if re.match(key, match.group(group)):
            return d[key]
    return match.group(group)

re.sub(r'(\d+)', lambda x: replacement(x, test_dict), '999la')
Sign up to request clarification or add additional context in comments.

1 Comment

this function addresses my dilemma perfectly!! Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.