1

Could anyone (with extensive experience in regular-expression matching) please clarify for me why the following query returns (what I consider) unexpected results in Oracle 12?

select regexp_substr('My email: [email protected]', '[^@:space:]+@[^@:space:]+') 
from dual;

Expected result: [email protected]

Actual result: t@t

Another example:

select regexp_substr('Beneficiary email: [email protected]', '[^@:space:]+@[^@:space:]+') 
from dual;

Expected result: [email protected]

Actual result: ry1@gm

EDIT: I double-checked and this is not related to Oracle SQL, but the same behaviour applies to any regex engine. Even when simplifying the regex to [^:space:]+@[^:space:]+ the results are the same. I am curious to know why it does not match all the non-whitespace characters before and after the @ sign. And why sometimes it matches 1 character, other times 2 or 3 or more characters, but not all.

2 Answers 2

3

The POSIX shortcut you are trying to use is incorrect, you need square brackets around it:

SELECT REGEXP_SUBSTR('Beneficiary email: [email protected]', '[^@[:space:]]+@[^@[:space:]]+') 
FROM dual;

or even simpler, assuming you only want to validate by checking for an '@' and the email address is always at the end of the string, after the last space:

WITH tbl(str) AS (
  SELECT 'My email: [email protected]' FROM dual UNION ALL
  SELECT 'Beneficiary email: [email protected]' FROM dual
)
SELECT REGEXP_REPLACE(str, '.* (.*@.*)', '\1')
from tbl
;

Note: REGEXP_REPLACE() will return the original string if the match is not found, where REGEXP_SUBSTR() will return NULL. Keep that in mind and handle no match found accordingly. Always expect the unexpected!

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! That was so stupid of me. But in my defense, it was not clear in the documentation that the POSIX classes must be enclosed in their own brackets.
Also thanks for the example with the differences in behaviour between regexp_replace vs regexp_substr. In my case the emails do not always appear at the end of the text, that was only an example. But thanks!
0

The REGEX is not correct in your SQL code. Try

select regexp_substr('Beneficiary email: [email protected]', '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b') 
from dual;

select regexp_substr('My email: [email protected]', '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b') 
from dual;

It gives the result that you expected.

5 Comments

Thanks, but I am curious why is the initial regex not correct? Assuming I'm not looking for valid emails, but anything which is: (not-space one-or-more-times)@(not-space one-or-more-times). Even if I simplify the regex to [^:space:]+@[^:space:]+ I still get the same result. Why is it not matching all non-whitespace characters?
Answer by @Gary_W suffices.
Thank you for the example of email regexp. Still, for some strange reason it does not work. Have you tried them? I suppose it's the word boundaries that have a different syntax in Oracle. I have also tried with double-escape \\b but still they don't work.
Indeed, according to the excellent regular-expressions info website: "Word boundaries, as described above, are supported by most regular expression flavors. Notable exceptions are the POSIX and XML Schema flavors, which don’t support word boundaries at all."
It seems that in Oracle this is the correct regexp: (^|\W)[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}($|\W) as seen here: stackoverflow.com/questions/7567700/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.