Why is Oracle SQL function regexp_substr not returning all matching characters?

Question

Could anyone (with extensive experience in regular-expression matching) please clarify for me why the following query returns (what I consider) unexpected results in Oracle 12?

select regexp_substr('My email: [email protected]', '[^@:space:]+@[^@:space:]+') 
from dual;

Expected result: [email protected]

Actual result: t@t

Another example:

select regexp_substr('Beneficiary email: [email protected]', '[^@:space:]+@[^@:space:]+') 
from dual;

Expected result: [email protected]

Actual result: ry1@gm

EDIT: I double-checked and this is not related to Oracle SQL, but the same behaviour applies to any regex engine. Even when simplifying the regex to [^:space:]+@[^:space:]+ the results are the same. I am curious to know why it does not match all the non-whitespace characters before and after the @ sign. And why sometimes it matches 1 character, other times 2 or 3 or more characters, but not all.

Gary_W · Accepted Answer · 2020-07-14 14:27:38Z

3

The POSIX shortcut you are trying to use is incorrect, you need square brackets around it:

SELECT REGEXP_SUBSTR('Beneficiary email: [email protected]', '[^@[:space:]]+@[^@[:space:]]+') 
FROM dual;

or even simpler, assuming you only want to validate by checking for an '@' and the email address is always at the end of the string, after the last space:

WITH tbl(str) AS (
  SELECT 'My email: [email protected]' FROM dual UNION ALL
  SELECT 'Beneficiary email: [email protected]' FROM dual
)
SELECT REGEXP_REPLACE(str, '.* (.*@.*)', '\1')
from tbl
;

Note: REGEXP_REPLACE() will return the original string if the match is not found, where REGEXP_SUBSTR() will return NULL. Keep that in mind and handle no match found accordingly. Always expect the unexpected!

edited Jul 14, 2020 at 14:27

answered Jul 14, 2020 at 14:17

Gary_W

10.4k1 gold badge26 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sorin Postelnicu Over a year ago

Thank you! That was so stupid of me. But in my defense, it was not clear in the documentation that the POSIX classes must be enclosed in their own brackets.

Sorin Postelnicu Over a year ago

Also thanks for the example with the differences in behaviour between regexp_replace vs regexp_substr. In my case the emails do not always appear at the end of the text, that was only an example. But thanks!

MdBasha · Accepted Answer · 2020-07-14 13:13:10Z

0

The REGEX is not correct in your SQL code. Try

select regexp_substr('Beneficiary email: [email protected]', '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b') 
from dual;

select regexp_substr('My email: [email protected]', '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b') 
from dual;

It gives the result that you expected.

answered Jul 14, 2020 at 13:13

MdBasha

4515 silver badges16 bronze badges

5 Comments

Sorin Postelnicu Over a year ago

Thanks, but I am curious why is the initial regex not correct? Assuming I'm not looking for valid emails, but anything which is: (not-space one-or-more-times)@(not-space one-or-more-times). Even if I simplify the regex to [^:space:]+@[^:space:]+ I still get the same result. Why is it not matching all non-whitespace characters?

MdBasha Over a year ago

Answer by @Gary_W suffices.

Sorin Postelnicu Over a year ago

Thank you for the example of email regexp. Still, for some strange reason it does not work. Have you tried them? I suppose it's the word boundaries that have a different syntax in Oracle. I have also tried with double-escape \\b but still they don't work.

Sorin Postelnicu Over a year ago

Indeed, according to the excellent regular-expressions info website: "Word boundaries, as described above, are supported by most regular expression flavors. Notable exceptions are the POSIX and XML Schema flavors, which don’t support word boundaries at all."

Sorin Postelnicu Over a year ago

It seems that in Oracle this is the correct regexp: (^|\W)[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}($|\W) as seen here: stackoverflow.com/questions/7567700/…

Collectives™ on Stack Overflow

Why is Oracle SQL function regexp_substr not returning all matching characters?

2 Answers 2

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related