2

I am trying to manipulate a text from SNMP sysDescr.0 output with Python 3, I need to use a dictionary that contains the patterns and their replacements as follows:

myDict = {
    r' \(\/sw.+': '',
    r' \(\/ws.+$': '',
    r'Compiled on.{36}': '',
    r'Ruckus Wireless, Inc. ': '',
    r'Brocade Communications Systems, Inc. ': '',
    r' Switch': '',
    r', ROM': ' - ROM',
    r' revision': 'revision',
    r' IronWare': 'IronWare'
}

I found belowcode here but the first three patterns in the dictionary are not working, while the rest are OK, I don't know why:

def multiple_replace(myDict, text):
    regex = re.compile(r'(%s)' % '|'.join(map(re.escape, myDict.keys())))
    return regex.sub(lambda mo: myDict.get(mo.group(), mo.group()),text)

How can I modify the above function to be able to correctly run RegEX for the first three patterns? I tried most similar solutions here but non of them was able to handle the first three patterns.

My simple version is below, but I am really interested to see how the first solution should be modified to work correctly as I am new to python anyway:

def multiple_replace(myDict, text):
    for key, val in myDict.items():
        if re.search(key, text):
            text = re.sub(key, val, text)
    return text  

Here is an example of the output:

HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

and what I need it to become:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
HP J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1

Honestly I have no idea which is better or faster, your input is appreciated.

thanks

8
  • You are mixing regex and literal patterns. map(re.escape, myDict.keys()) escapes the strings, and corrupts the regex patterns. Commented Feb 16, 2019 at 20:52
  • thank you, how should the function become then? should I remove re.escape? Commented Feb 16, 2019 at 21:35
  • No, make sure you make your keys are all regexps. However, it looks like your first regexps do not do what you expect them to do. You must try r' \(\/sw.+\)' and r' \(\/ws.+\)'. See ideone.com/iXaHBt Commented Feb 17, 2019 at 10:16
  • thank you very much, but I did not to do that with the 2nd solution. Also note that it needs modification to work as follows: def multiple_replaec(mydict, text): for rx, repl in mydict.items(): if re.search(rx, text): text = re.sub(rx, repl, text) return text.strip() Commented Feb 17, 2019 at 13:09
  • BTW, adding re.search to the function makes execution faster. Commented Feb 17, 2019 at 13:15

1 Answer 1

3

You should make sure you make your keys are all compiled re objects, and once you have your regex-replacement dictionary ready, you will need to iterate over these key-value pairs and perform replacements one by one, with

for rx,repl in myDict.items():
        text = rx.sub(repl, text)

where rx will be the compiled re object and repl is the replacement string.

Full code snippet:

import re
myDict = {
    re.compile(r' \(\/sw.+\)'): '',
    re.compile(r' \(\/ws.+\)'): '',
    re.compile(r'Compiled on.{36}'): '',
    re.compile(re.escape(r'Ruckus Wireless, Inc. ')): '',
    re.compile(re.escape(r'Brocade Communications Systems, Inc. ')): '',
    re.compile(re.escape(r' Switch')): '',
    re.compile(re.escape(r', ROM')): ' - ROM',
    re.compile(re.escape(r' revision')): 'revision',
    re.compile(re.escape(r' IronWare')): 'IronWare'
}
s = """HP J9856A 2530-24G-2SFP+ Switch, revision YA.16.05.0004, ROM YA.15.20 (/ws/swbuildm/rel_venice_qaoff/code/build/lakes(swbuildm_rel_venice_qaoff_rel_venice)) (Formerly ProCurve),.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A Switch 2610-48, revision R.11.122, ROM R.10.06 (/sw/code/build/nemo),.1.3.6.1.4.1.11.2.3.7.11.77
Ruckus Wireless, Inc. ICX7250-48-HPOE, IronWare Version 08.0.70aT211 Compiled on Jan 18 2018 at 04:21:25 labeled as SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1"""

def multiple_replace(myDict, text):
    for rx,repl in myDict.items():
        text = rx.sub(repl, text)
    return text

print(multiple_replace(myDict, s))

See the Python demo.

Output:

HP J9856A 2530-24G-2SFP+,revision YA.16.05.0004 - ROM YA.15.20,.1.3.6.1.4.1.11.2.3.7.11.166
ProCurve J9088A 2610-48,revision R.11.122 - ROM R.10.06,.1.3.6.1.4.1.11.2.3.7.11.77
ICX7250-48-HPOE,IronWare Version 08.0.70aT211 SPS08070a,.1.3.6.1.4.1.1991.1.3.62.2.2.1.1
Sign up to request clarification or add additional context in comments.

2 Comments

Do you need to compile? Most RE functions are automatically compiled and cached. Does compiling make it clearer? Does it provide sufficient performance improvement? github.com/python/cpython/blob/…
@PeterCibulskis In this case, when using a lot of regexps against lots of strings, compiled patterns are more efficient.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.