1

I want to capture tokens in a text in the following pattern:

The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.

example:

AA123123A1

AA123123A

AA123123123

i want to match and capture start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2

Example:

AA123123A1 => [AA,123123,A1]

AA123123A. => [AA,123123,A]

AA123123123 => [AA,123123123,'']

the following regex is working in python but not in postgres.

regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'

In Postgressql

    select regexp_matches('AA2311121A1',
    '^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');

result:

{AA,2311121A1,""}

I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.

0

1 Answer 1

1

You can use

^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$

See the regex demo and a DB fiddle online:

enter image description here

Details:

  • ^ - start of string
  • ([A-Za-z]{2}) - two ASCII letters
  • (.*?) - Group 2: any zero or more chars as few as possible
  • ([A-Za-z][0-9]?)? - Group 3: an optional sequence of an ASCII letter and then an optional digit
  • $ - end of string.
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Wiktor This is working good, but the motive of asking this is more on understanding the behaviour of positive lookahead in Postgres and how to make it work like it is working in python.
@YOGENDRASONI So, all you need to know is how lookaheads work? They work the same anywhere where they are supported.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.