Replace substrings, with additional processing

Question

I am looking for a pythonic way to replace substrings in a string, similar to re.sub, but with additional processing of the found text. It can probably be achieved with pure regular expression syntax, but it very quickly becomes unreadable, which is worse than simple—really hard to extend/debug.

This is what I need to achieve:

Input string: text1 (2, 100) text2 (34,23) text3

Output: Same string, but (2, 100) wrapped into an HTML code using values 12 and 14; same for (34, 23). Something like:

text1 <span data-coord='{"x": 0.02, "y": 1}'>(2, 100)</span>
text2 <span data-coord='{"x": 0.34, "y": 0.23}'>(34, 23)</span> 
text3

Iteration through matches with re.finditer seems a logical solution, but how do I get the rest of the text?

EDIT: Numbers may be one- to three-digit ones, between 0 and 100.

FOOTNOTE: I'd really prefer to have a solution where the found groups for x and y are an input to my custom function, to have a complete freedom of what to do with the found groups. E.g. to do error processing: in case the number is outside the range of 0...100, I may want to highlight it with red. I am sure I can define that behaviour in terms of regex as well, but I find it wrong: regex is for text processing, not number manipulation. And it obscures the logic of the code.

Jan · Accepted Answer · 2017-05-02 18:26:57Z

3

You could use

import re

rx = re.compile(r'\((?P<x>\d+),\s*(?P<y>\d+)\)')

# before
string = "text1 (12, 14) text2 (34,23) text3"

def convert(match):
    return '''<span data-coord='{{"x": 0.{}, "y": 0.{}"}}'>{}</span>'''.format(
            match.group('x'),
            match.group('y'),
            match.group(0)
    )

string = rx.sub(convert, string)

print(string)
# text1 <span data-coord='{"x": 0.12, "y": 0.14"}'>(12, 14)</span> 
# text2 <span data-coord='{"x": 0.34, "y": 0.23"}'>(34,23)</span>
# text3

Effectively using a convert function in combination with .format()

edited May 2, 2017 at 18:26

answered May 1, 2017 at 19:26

Jan

43.3k11 gold badges57 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

texnic Over a year ago

I think this is exactly what I've been looking for. This use of function instead of replacement pattern is new to me, I missed it in the documentation. Will check and get back.

texnic Over a year ago

@Jan, Eric means that sin is a function while sin(90°) is a number (1), not a function. Though it's a really fine detail in this context, I actually appreciate his remark. I am trying to improve my knowledge and it's good to highlight the use of functions as arguments.

Eric Duminil Over a year ago

@texnic: Exactly. Jan: thanks for the change, your answer is fine!

m0nhawk · Accepted Answer · 2017-05-01 17:50:49Z

1

The regex is pretty simple:

# two one or more digits separated by comma and none or more spaces, wrapped in parenthesis
\((\d+),\s*(\d+)\)

Then you can use re.sub with grouping:

>>> re.sub(r'\((\d+),\s*(\d+)\)', r'''<span data-coord='{"x": 0.\g<1>, "y": 0.\g<2>}'>(\g<1>, \g<2>)</span>''', text)
text1 <span data-coord='{"x": 0.12, "y": 0.14}'>(12, 14)</span> text2 <span data-coord='{"x": 0.34, "y": 0.23}'>(34, 23)</span> text3

answered May 1, 2017 at 17:50

m0nhawk

24.5k9 gold badges50 silver badges74 bronze badges

1 Comment

texnic Over a year ago

Nice solution, still quite simple. So I may change my mind about leaving the regex world. But please see my edit of the question. I wonder how it could be incorporated.

Collectives™ on Stack Overflow

Replace substrings, with additional processing

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related