Multiple occurences of same character in a string regexp - Python

Question

Given a string made up of 3 capital letters, 1 small caps and another 3 capital ones, e.g. AAAaAAA

I can't seem to find a regexp that would find a string which matches a string that has:

first 3 capital letters all different
any small caps letter
first 2 same capital letters as the very first one
last capital letter the same as the last capital letter in the first "trio"

e.g. A B C a AA C (no spaces)

EDIT:

Turns out I needed something slightly different e.g. ABCaAAC where 'a' is the small caps version of the very fist character, not just any character

Do you absolutely need this to be a regex? The solution would be way simpler in plain code. — Sasha Chedygov
– Sasha Chedygov, Commented Mar 20, 2012 at 22:38
Not particularly, I just thought regexes where "simpler" than writing loops, if statements and everything else. Plus I like regexes :) — Rupert Cobbe-Warburton
– Rupert Cobbe-Warburton, Commented Mar 20, 2012 at 23:45
Refer to this question. In python, that is ([a-zA-Z])\1{n,} which means match the same character more than n+1 times — zhy
– zhy, Commented Apr 14, 2020 at 3:01

Andrew Clark · Accepted Answer · 2012-03-20 22:52:20Z

11

The following should work:

^([A-Z])(?!.?\1)([A-Z])(?!\2)([A-Z])[a-z]\1\1\3$

For example:

>>> regex = re.compile(r'^([A-Z])(?!.?\1)([A-Z])(?!\2)([A-Z])[a-z]\1\1\3$')
>>> regex.match('ABAaAAA')  # fails: first three are not different
>>> regex.match('ABCaABC')  # fails: first two of second three are not first char
>>> regex.match('ABCaAAB')  # fails: last char is not last of first three
>>> regex.match('ABCaAAC')  # matches!
<_sre.SRE_Match object at 0x7fe09a44a880>

Explanation:

^          # start of string
([A-Z])    # match any uppercase character, place in \1
(?!.?\1)   # fail if either of the next two characters are the previous character
([A-Z])    # match any uppercase character, place in \2
(?!\2)     # fail if next character is same as the previous character
([A-Z])    # match any uppercase character, place in \3
[a-z]      # match any lowercase character
\1         # match capture group 1
\1         # match capture group 1
\3         # match capture group 3
$          # end of string

If you want to pull these matches out from a larger chunk of text, just get rid of the ^ and $ and use regex.search() or regex.findall().

You may however find the following approach easier to understand, it uses regex for the basic validation but then uses normal string operations to test all of the extra requirements:

def validate(s):
    return (re.match(r'^[A-Z]{3}[a-z][A-Z]{3}$', s) and s[4] == s[0] and 
            s[5] == s[0] and s[-1] == s[2] and len(set(s[:3])) == 3)

>>> validate('ABAaAAA')
False
>>> validate('ABCaABC')
False
>>> validate('ABCaAAB')
False
>>> validate('ABCaAAC')
True

edited Mar 20, 2012 at 22:52

answered Mar 20, 2012 at 22:38

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rupert Cobbe-Warburton Over a year ago

Thanks a lot for the clear explanation. Nice to know I was onto something as I was using something similar. Didn't know about the ?!.? notation.

Andrew Clark Over a year ago

@RupertCobbe-Warburton - Just to make sure you are understanding the (?!.?\1) portion correctly, (?!...) is a negative lookahead, so the match will fail if whatever ... is can match. .?\1 is what we want to fail on, which means optionally match any one character (.?), followed by whatever is in the first capture group (\1).

Collectives™ on Stack Overflow

Multiple occurences of same character in a string regexp - Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related