1

I am looking for a pythonic way to replace substrings in a string, similar to re.sub, but with additional processing of the found text. It can probably be achieved with pure regular expression syntax, but it very quickly becomes unreadable, which is worse than simple—really hard to extend/debug.

This is what I need to achieve:

Input string: text1 (2, 100) text2 (34,23) text3

Output: Same string, but (2, 100) wrapped into an HTML code using values 12 and 14; same for (34, 23). Something like:

text1 <span data-coord='{"x": 0.02, "y": 1}'>(2, 100)</span>
text2 <span data-coord='{"x": 0.34, "y": 0.23}'>(34, 23)</span> 
text3

Iteration through matches with re.finditer seems a logical solution, but how do I get the rest of the text?

EDIT: Numbers may be one- to three-digit ones, between 0 and 100.

FOOTNOTE: I'd really prefer to have a solution where the found groups for x and y are an input to my custom function, to have a complete freedom of what to do with the found groups. E.g. to do error processing: in case the number is outside the range of 0...100, I may want to highlight it with red. I am sure I can define that behaviour in terms of regex as well, but I find it wrong: regex is for text processing, not number manipulation. And it obscures the logic of the code.

2 Answers 2

3

You could use

import re

rx = re.compile(r'\((?P<x>\d+),\s*(?P<y>\d+)\)')

# before
string = "text1 (12, 14) text2 (34,23) text3"

def convert(match):
    return '''<span data-coord='{{"x": 0.{}, "y": 0.{}"}}'>{}</span>'''.format(
            match.group('x'),
            match.group('y'),
            match.group(0)
    )

string = rx.sub(convert, string)

print(string)
# text1 <span data-coord='{"x": 0.12, "y": 0.14"}'>(12, 14)</span> 
# text2 <span data-coord='{"x": 0.34, "y": 0.23"}'>(34,23)</span>
# text3

Effectively using a convert function in combination with .format()

Sign up to request clarification or add additional context in comments.

3 Comments

I think this is exactly what I've been looking for. This use of function instead of replacement pattern is new to me, I missed it in the documentation. Will check and get back.
@Jan, Eric means that sin is a function while sin(90°) is a number (1), not a function. Though it's a really fine detail in this context, I actually appreciate his remark. I am trying to improve my knowledge and it's good to highlight the use of functions as arguments.
@texnic: Exactly. Jan: thanks for the change, your answer is fine!
1

The regex is pretty simple:

# two one or more digits separated by comma and none or more spaces, wrapped in parenthesis
\((\d+),\s*(\d+)\)

Then you can use re.sub with grouping:

>>> re.sub(r'\((\d+),\s*(\d+)\)', r'''<span data-coord='{"x": 0.\g<1>, "y": 0.\g<2>}'>(\g<1>, \g<2>)</span>''', text)
text1 <span data-coord='{"x": 0.12, "y": 0.14}'>(12, 14)</span> text2 <span data-coord='{"x": 0.34, "y": 0.23}'>(34, 23)</span> text3

1 Comment

Nice solution, still quite simple. So I may change my mind about leaving the regex world. But please see my edit of the question. I wonder how it could be incorporated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.