Python regex with negated pattern

Question

I'm trying to write a regex statement in Python with a negated pattern. I want to match a pattern that doesn't start with a U followed by a W and optionally ends with a 1. Below are some examples.

TUW1TH > # regex does not get applied
JUWRG > # regex does not get applied
BUIUW1 > # regex does not get applied
ATWKO > ATW KO # regex applies and space is added after the W
EWRG > E WRG # regex applies and space is added after the W
AGDTWSD > AGDTW SD # regex applies and space is added after the W

Below is the regex statement I tried to use:

 re.sub(ur"[^U]W[^?1]", ur"W ", word)

Where does the space go if there is a 1 after the W (assuming it is not preceeded by a U)? For example, "EW1RG" -> ???. You didn't give an example of a case like that. — RootTwo
– RootTwo, Commented Mar 7, 2016 at 3:27

RootTwo · Accepted Answer · 2016-03-07 03:39:51Z

2

I think you are asking to match a 'W' optionally followed by a '1', but only if the 'W' is not preceded by a 'U'. If that is the case, a "negative look behind" is the answer:

import re

testcases = ['TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD', 'W1EF', 'EW1RG']

# The `(W1?)` part matches a 'W' with an optional '1'. The `(?<!U)` part 
#     matches the current position only if it wasn't a preceded by a 'U'
pattern = re.compile(r'(?<!U)(W1?)')

for s in testcases:
    print(pattern.sub(r'\1 ', s))

outputs:

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD
W1 EF
EW1 RG

Note: [^U] doesn't work at the beginning of a line.

answered Mar 7, 2016 at 3:39

RootTwo

4,4361 gold badge13 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chris Kitching · Accepted Answer · 2016-03-07 00:25:51Z

0

Looks like you want [^U]W1?

You used a character class containing "not ?", instead of the token "optionally a 1".

answered Mar 7, 2016 at 0:25

Chris Kitching

2,65525 silver badges38 bronze badges

Comments

mhawke · Accepted Answer · 2016-03-07 00:56:38Z

Try regex pattern ([^U])W1?' and use it with re.sub() with a substitution that references the captured group, like this:

import re

pattern = re.compile(r'([^U]W)1?')
for s in 'TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD':
    print(pattern.sub(r'\1 ', s))

Output

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD

Note that the output for 'EWRG' differs from your sample... I think that's a typo in your question?

Your question wasn't clear about what to do with the optional 1 following the W and there was no sample to demonstrate. Is the 1 to be removed, or kept? The above code will lose the 1:

>>> print(pattern.sub(r'\1 ', 'TW1TH'))
TW TH

If you wanted the output to include the 1, then you can change the regex pattern to r'([^U]W)(1?)' to add a second capturing group for the optional 1, and change the substitution to r\1 \2:

>>> re.sub(r'([^U]W)(1?)', r'\1 \2', 'TW1TH')
'TW 1TH'

Collectives™ on Stack Overflow

Python regex with negated pattern

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related