Regex replace (in Python) - a simpler way?

Question

Any time I want to replace a piece of text that is part of a larger piece of text, I always have to do something like:

"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"

And then concatenate the start group with the new data for replace and then the end group.

Is there a better method for this?

If you can, try to tokenize data in this situation (break it into smaller parts based on regex rules) beforehand and replace based on these as this is more likely to be easier to accomplish the type of thing you are doing rather than dealing with the entire text document each time you are doing a replace, for example if you could just tokenize the <start> and <end> into separate things to begin with (into arrays) this would make it easier I think, in the short term it takes a bit of getting used to but in the long run it makes these types of things easier — Rick
– Rick, Commented Aug 18, 2010 at 19:38

Roger Pate · Accepted Answer · 2009-01-29 05:56:21Z

105

>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a callable, it's passed the match object and must return
    a replacement string to be used.

answered Jan 29, 2009 at 5:56

Roger Pate

Sign up to request clarification or add additional context in comments.

3 Comments

Alex Over a year ago

Hi Roger, I've been playing around with that regex string. I understand how the RE part of it works, but I don't understand how the python puts 'start' at the start, and 'end' at the end. Do you think you could help explain that? Thank you! :)

Roger Pate Over a year ago

Those aren't 'matched' by the regex. (Look at m.group(0).) It gets broken down to "start (matched text here) end", and the match is what gets replaced. Looking at it now (almost a year later), I'm not sure why I did this way, except to show basic lookbehind and lookahead syntax.

haridsv Over a year ago

Note that this is not taking advantage of compiled regular expressions, so should incur additional expense to compile regex everytime it is used.

zenazn · Accepted Answer · 2009-01-29 05:51:21Z

18

Look in the Python re documentation for lookaheads (?=...) and lookbehinds (?<=...) -- I'm pretty sure they're what you want. They match strings, but do not "consume" the bits of the strings they match.

answered Jan 29, 2009 at 5:51

zenazn

14.4k2 gold badges38 silver badges26 bronze badges

3 Comments

Evan Fosmark Over a year ago

The problem with that is that it must be a fixed-width. I need something that allows for more complex patterns.

Tomalak Over a year ago

@Evan: In most regex engines, it must be fixed width for look-behind only.

Evan Fosmark Over a year ago

Tomalak, what I need is to be able to have a non-fixed width prefix pattern.

Ben Blank · Accepted Answer · 2009-01-29 15:11:48Z

The short version is that you cannot use variable-width patterns in lookbehinds using Python's re module. There is no way to change this:

>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    re.sub("(?<=fo+)bar(?=baz)", "quux", string)
  File "C:\Development\Python25\lib\re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Development\Python25\lib\re.py", line 241, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern

This means that you'll need to work around it, the simplest solution being very similar to what you're doing now:

>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
        return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)

This doesn't have the elegance of the lookbehind solution, but it's still a very clear, straightforward one-liner. And if you look at what an expert has to say on the matter (he's talking about JavaScript, which lacks lookbehinds entirely, but many of the principles are the same), you'll see that his simplest solution looks a lot like this one.

aplavin · Accepted Answer · 2012-07-17 11:14:15Z

5

I believe that the best idea is just to capture in a group whatever you want to replace, and then replace it by using the start and end properties of the captured group.

regards

Adrián

#the pattern will contain the expression we want to replace as the first group
pat = "word1\s(.*)\sword2"   
test = "word1 will never be a word2"
repl = "replace"

import re
m = re.search(pat,test)

if m and m.groups() > 0:
    line = test[:m.start(1)] + repl + test[m.end(1):]
    print line
else:
    print "the pattern didn't capture any text"

This will print: 'word1 will never be a word2'

The group to be replaced could be located in any position of the string.

edited Jul 17, 2012 at 11:14

aplavin

2,2395 gold badges35 silver badges55 bronze badges

answered Jan 12, 2010 at 23:39

Adrián Deccico

4,9852 gold badges24 silver badges30 bronze badges

2 Comments

Jürgen A. Erhard Over a year ago

Just one (late) tip: get rid of "0:" and ":len(test)". They're unnecessary noise.

haridsv Over a year ago

@jae You still need colon, otherwise it won't splice the string.

Collectives™ on Stack Overflow

Regex replace (in Python) - a simpler way?

4 Answers 4

3 Comments

3 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related