1

I am using Regex Substring to filter out values that have 'p' in the start and ends before '-'. p is followed by 6 digits.

My Code :

 code,REGEXP_SUBSTR(CODE,'^[p][^-]+')
CODE REGEXP_SUBSTR(CODE,'^[P][^-]+')
p700401- p700401
p791701- p791701
100-,p788001-, null

This is the result , but I am struggling to handle cases like in 3rd Row.

100-,p788001-

Can Someone Please guide me to handle such cases

3 Answers 3

2

If you want to match complete terms in a comma-delimited string then you can use:

SELECT code,
       REGEXP_SUBSTR(code, '(^|,)(p\d{6})-(,|$)', 1, 1, NULL, 2) AS result
FROM   table_name;

Which, for the sample data:

CREATE TABLE table_name (code) as
  SELECT 'p700401-' FROM DUAL UNION ALL
  SELECT 'p791701-' FROM DUAL UNION ALL
  SELECT '100-,p788001-,' FROM DUAL UNION ALL
  SELECT '123-,p456789-xyz,p987654-' FROM DUAL UNION ALL
  SELECT 'p111111-,p222222-not_this,p333333-,p444444-' FROM DUAL;

Outputs:

CODE RESULT
p700401- p700401
p791701- p791701
100-,p788001-, p788001
123-,p456789-xyz,p987654- p987654
p111111-,p222222-not_this,p333333-,p444444- p111111

Displaying multiple terms

If you want to remove the non-matching terms from the string then:

SELECT code,
       LTRIM(
         REGEXP_REPLACE(
           ',' || REPLACE(code, ',', ',,') || ',',
           '((,p\d{6})-,)|,.*?,',
           '\2'
         ),
         ','
       ) AS result
FROM   table_name;

Which, outputs:

CODE RESULT
p700401- p700401
p791701- p791701
100-,p788001-, p788001
123-,p456789-xyz,p987654- p987654
p111111-,p222222-not_this,p333333-,p444444- p111111,p333333,p444444

And if you want to split the list into rows then:

SELECT t.code,
       i.*
FROM   (
         SELECT code,
                ',' || REPLACE(code, ',', ',,') || ',' AS double_delims
         FROM   table_name
       ) t
       INNER JOIN LATERAL (
         SELECT LEVEL As item,
                REGEXP_SUBSTR(double_delims, ',(p\d{6})-,|,(.*?),', 1, LEVEL, NULL, 1)
                  AS value
         FROM   DUAL
         CONNECT BY LEVEL <= REGEXP_COUNT(double_delims, ',(p\d{6})-,|,(.*?),')
       ) i
       ON (i.value IS NOT NULL);

Which outputs:

CODE ITEM VALUE
p700401- 1 p700401
p791701- 1 p791701
100-,p788001-, 2 p788001
123-,p456789-xyz,p987654- 3 p987654
p111111-,p222222-not_this,p333333-,p444444- 1 p111111
p111111-,p222222-not_this,p333333-,p444444- 3 p333333
p111111-,p222222-not_this,p333333-,p444444- 4 p444444

fiddle

Sign up to request clarification or add additional context in comments.

4 Comments

+ But did you leave out the first number out of the results on purpose in line 4?
@JvdV The first line states "If you want to match complete terms" and the regular expression would not match the complete term for ,p456789-xyz,; so, yes, that row was included to show that it would match the correct term when matching complete terms and it should match the last term in the delimited list and not the middle term.
Thanks, it's clear how you interpreted the data then. I borrowed how you created the schema but could have misinterpreted the desired outcome then.
thanks MT0 for the thorough answer. Always appreciate the input .
1

For sample data you posted, this returns the result you wanted (i.e. take "p" followed by exactly 6 digits):

SQL> with test (code) as
  2    (select 'p700401-' from dual union all
  3     select 'p791701-' from dual union all
  4     select '100-,p788001-,' from dual
  5    )
  6  select code,
  7         regexp_substr(code, 'p\d{6}') result
  8  from test;

CODE           RESULT
-------------- --------------
p700401-       p700401
p791701-       p791701
100-,p788001-, p788001

SQL>

6 Comments

Can I ask you a question @Littlefoot? Would something like SELECT code, REGEXP_REPLACE (code, '(?:\b[^p,-]\w*-,|,$)', '') result FROM test; work? I was thinking what if data is something like 100-,p788001-,100-,p788002-, but I have no means of testing this.
It wouldn't work, @JvdV, I tested it. That regex returns the "original" code, doesn't "replace" anything.
Thanks for the feedback. I wonder why. Maybe part of the regex would not be supported syntax. The idea would be to return something like this. Maybe the non-capturing group is supposed to be a regular capture group...
I'm not that good at regex, @JvdV; I only know some Oracle syntax. Wiktor Stribiżew is the regex expert here, I guess he'd be able to explain it.
@JvdV Oracle does not support non-capturing groups (: ) or word boundaries \b so you cannot use that method.
|
1

Right, my two cents is to use REGEXP_REPLACE():

CREATE TABLE tst (code) as
  SELECT 'p700401-' FROM DUAL UNION ALL
  SELECT 'p791701-' FROM DUAL UNION ALL
  SELECT '100-,z123456' FROM DUAL UNION ALL
  SELECT '100-,p788001-,' FROM DUAL UNION ALL
  SELECT 'p788001-,100-' FROM DUAL UNION ALL
  SELECT '123-,p456789-xyz,p987654-' FROM DUAL;

SELECT
  code, REGEXP_REPLACE(REGEXP_REPLACE(code, '(p\d{6})-|.', '\1'), '(\d)(p)', '\1,\2') AS result
FROM tst

Resuls in:

CODE RESULT
p700401- p700401
p791701- p791701
100-,z123456 null
100-,p788001-, p788001
p788001-,100- p788001
123-,p456789-xyz,p987654- p456789,p987654

It's a nested statement due to the lack of support for handy regex syntax as per given link.

The 1st regex pattern is supposed to replace anything other than what you are after, see an online demo. The 2nd one is there to insert comma's back to seperate these values, see the demo.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.