2

I'm trying to write a regex statement in Python with a negated pattern. I want to match a pattern that doesn't start with a U followed by a W and optionally ends with a 1. Below are some examples.

TUW1TH > # regex does not get applied
JUWRG > # regex does not get applied
BUIUW1 > # regex does not get applied
ATWKO > ATW KO # regex applies and space is added after the W
EWRG > E WRG # regex applies and space is added after the W
AGDTWSD > AGDTW SD # regex applies and space is added after the W

Below is the regex statement I tried to use:

 re.sub(ur"[^U]W[^?1]", ur"W ", word)
2
  • Have you tried using 1? instead of [^?1]? Commented Mar 7, 2016 at 0:25
  • 1
    Where does the space go if there is a 1 after the W (assuming it is not preceeded by a U)? For example, "EW1RG" -> ???. You didn't give an example of a case like that. Commented Mar 7, 2016 at 3:27

3 Answers 3

2

I think you are asking to match a 'W' optionally followed by a '1', but only if the 'W' is not preceded by a 'U'. If that is the case, a "negative look behind" is the answer:

import re

testcases = ['TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD', 'W1EF', 'EW1RG']

# The `(W1?)` part matches a 'W' with an optional '1'. The `(?<!U)` part 
#     matches the current position only if it wasn't a preceded by a 'U'
pattern = re.compile(r'(?<!U)(W1?)')

for s in testcases:
    print(pattern.sub(r'\1 ', s))

outputs:

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD
W1 EF
EW1 RG

Note: [^U] doesn't work at the beginning of a line.

Sign up to request clarification or add additional context in comments.

Comments

0

Looks like you want [^U]W1?

You used a character class containing "not ?", instead of the token "optionally a 1".

Comments

0

Try regex pattern ([^U])W1?' and use it with re.sub() with a substitution that references the captured group, like this:

import re

pattern = re.compile(r'([^U]W)1?')
for s in 'TUW1TH', 'JUWRG', 'BUIUW1', 'ATWKO', 'EWRG', 'AGDTWSD':
    print(pattern.sub(r'\1 ', s))

Output

TUW1TH
JUWRG
BUIUW1
ATW KO
EW RG
AGDTW SD

Note that the output for 'EWRG' differs from your sample... I think that's a typo in your question?

Your question wasn't clear about what to do with the optional 1 following the W and there was no sample to demonstrate. Is the 1 to be removed, or kept? The above code will lose the 1:

>>> print(pattern.sub(r'\1 ', 'TW1TH'))
TW TH

If you wanted the output to include the 1, then you can change the regex pattern to r'([^U]W)(1?)' to add a second capturing group for the optional 1, and change the substitution to r\1 \2:

>>> re.sub(r'([^U]W)(1?)', r'\1 \2', 'TW1TH')
'TW 1TH'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.