Python using dictionary for multiple RegEX re.sub

Question

I am trying to manipulate a text from SNMP sysDescr.0 output with Python 3, I need to use a dictionary that contains the patterns and their replacements as follows:

myDict = {
    r' \(\/sw.+': '',
    r' \(\/ws.+$': '',
    r'Compiled on.{36}': '',
    r'Ruckus Wireless, Inc. ': '',
    r'Brocade Communications Systems, Inc. ': '',
    r' Switch': '',
    r', ROM': ' - ROM',
    r' revision': 'revision',
    r' IronWare': 'IronWare'
}

I found belowcode here but the first three patterns in the dictionary are not working, while the rest are OK, I don't know why:

def multiple_replace(myDict, text):
    regex = re.compile(r'(%s)' % '|'.join(map(re.escape, myDict.keys())))
    return regex.sub(lambda mo: myDict.get(mo.group(), mo.group()),text)

How can I modify the above function to be able to correctly run RegEX for the first three patterns? I tried most similar solutions here but non of them was able to handle the first three patterns.

My simple version is below, but I am really interested to see how the first solution should be modified to work correctly as I am new to python anyway:

def multiple_replace(myDict, text):
    for key, val in myDict.items():
        if re.search(key, text):
            text = re.sub(key, val, text)
    return text

Here is an example of the output:

HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

and what I need it to become:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
HP J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

Honestly I have no idea which is better or faster, your input is appreciated.

thanks

You are mixing regex and literal patterns. map(re.escape, myDict.keys()) escapes the strings, and corrupts the regex patterns. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 16, 2019 at 20:52
thank you, how should the function become then? should I remove re.escape? — elekgeek
– elekgeek, Commented Feb 16, 2019 at 21:35
No, make sure you make your keys are all regexps. However, it looks like your first regexps do not do what you expect them to do. You must try r' \(\/sw.+\)' and r' \(\/ws.+\)'. See ideone.com/iXaHBt — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 17, 2019 at 10:16
thank you very much, but I did not to do that with the 2nd solution. Also note that it needs modification to work as follows: def multiple_replaec(mydict, text): for rx, repl in mydict.items(): if re.search(rx, text): text = re.sub(rx, repl, text) return text.strip() — elekgeek
– elekgeek, Commented Feb 17, 2019 at 13:09
BTW, adding re.search to the function makes execution faster. — elekgeek
– elekgeek, Commented Feb 17, 2019 at 13:15

Wiktor Stribiżew · Accepted Answer · 2019-02-17 23:59:03Z

3

You should make sure you make your keys are all compiled re objects, and once you have your regex-replacement dictionary ready, you will need to iterate over these key-value pairs and perform replacements one by one, with

for rx,repl in myDict.items():
        text = rx.sub(repl, text)

where rx will be the compiled re object and repl is the replacement string.

Full code snippet:

import re
myDict = {
    re.compile(r' \(\/sw.+\)'): '',
    re.compile(r' \(\/ws.+\)'): '',
    re.compile(r'Compiled on.{36}'): '',
    re.compile(re.escape(r'Ruckus Wireless, Inc. ')): '',
    re.compile(re.escape(r'Brocade Communications Systems, Inc. ')): '',
    re.compile(re.escape(r' Switch')): '',
    re.compile(re.escape(r', ROM')): ' - ROM',
    re.compile(re.escape(r' revision')): 'revision',
    re.compile(re.escape(r' IronWare')): 'IronWare'
}
s = """HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1"""

def multiple_replace(myDict, text):
    for rx,repl in myDict.items():
        text = rx.sub(repl, text)
    return text

print(multiple_replace(myDict, s))

See the Python demo.

Output:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

answered Feb 17, 2019 at 23:59

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Peter Cibulskis Over a year ago

Do you need to compile? Most RE functions are automatically compiled and cached. Does compiling make it clearer? Does it provide sufficient performance improvement? github.com/python/cpython/blob/…

Wiktor Stribiżew Over a year ago

@PeterCibulskis In this case, when using a lot of regexps against lots of strings, compiled patterns are more efficient.

Collectives™ on Stack Overflow

Python using dictionary for multiple RegEX re.sub

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related