I am trying to find the right pattern which can extract the date from any string

"""

28/11/22 11-23333 to 28/11/22

11-23333 28/11/22 to 28/11/22

something 20.02.2022 end to 20.02.2022

7-03-21 start to 7-03-21

no date here to null

date is 2023/11/12 something to 2023/11/12

prefix2023-11-12suffix to 2023-11-12

2023.11.12 at start to 2023.11.12

"""

I tried many ways including below. However I want to make sure which pattern should works for extracting date from any string pattern. Because the text from where the date is to extract is un-predictable and dynamic. So I can't make changes in the code all the time if the given pattern couldn't extract the date.

Pattern I used:

REGEXP_SUBSTR(
        note,
        '((([0-9]{4})([./-])([0-9]{1,2})\4([0-9]{1,2}))|(([0-9]{1,2})([./-])([0-9]{1,2})\9([0-9]{2,4})))'
    ) AS extracted_date

and also want to filter the column which contain date pattern, so that it ensure the validation for any dynamic test

I tried this as well

REGEXP_LIKE(
    r.note,
    '((([0-9]{4})([./-])([0-9]{1,2})\4([0-9]{1,2}))|(([0-9]{1,2})([./-])([0-9]{1,2})\9([0-9]{2,4})))'
);

Is this the right option?

7 Replies 7

Will be very difficult if the pattern is un-predictable. Maybe an AI could do it.

So is '10/11/12' November 10 in 2012 or October 11 in 2012 or November 12 in 2010 or December 11 in 2010? How will you ever be able to tell? Looks like pure guesswork to me. Such strings should have never made it into the database, if you must know their dates there.

A refinement is advised if you are to continue with your unlimited 0-99 days and months. Change the year part at the end to 4 or 2 digits instead of the range {2,4}. ((([0-9]{4})([./-])([0-9]{1,2})\4([0-9]{1,2}))|(([0-9]{1,2})([./-])([0-9]{1,2})\9([0-9]{4}|[0-9]{2}))) Should you further want to refine the days part to correspond to number of days per a given month you'd have to take into account February 28-29 in leap years. The leap years regex from years 1900-2050 is (19(?:0[48]|[13579][26]|[2468][048])|20(?:0[48]|[13][26]|[24][048])|2000). Good luck !

Don't do this in SQL or PL/SQL, use your application.

This should have been a regular question - not a discussion.

And is your string example one string? Or 8 different strings? Needs better formatting to be clear.

I agree that this should have been a regular question and not a discussion. But well, as this is a discussion on best practice in this scenario:

  1. If you store a string with dates in your database and you are interested in the dates, you are violating first normal form, because you are not storing atomic data. So, either extract the dates in your app and then store them as dates in your database or remain oblivious to the dates in your DBMS.
  2. If the dates can have any format, then the probability that you see dates that are ambiguous is high. You must decide what to do in such cases, which either means pick the most likely date or ignore the date altogether. For the first option you would have to provide an algorithm to determine the likeliness of a date's interpretations.
  3. While regular expressions are rather powerful, you should use a programming language here instead. This will get you a much more readable and maintainable code. If you need a likeliness algorithm even, regular expressions won't suffice anyway.
  4. AI has been mentioned. This may be a good idea. Your examples all have the starting date before the ending date, but if a string is something like 'The test ended on 2025-10-10 and was started twelve days before', then you'd need to understand the text in order to extract the dates.

This should have been a regular question.

Extracting the sub-string that matches a pattern and converting that sub-string to a date in the expected format are two separate problems. The first, extracting sub-strings, is solvable. The second, converting the extracted values to dates, is ambiguous as some strings will match multiple date formats.

Since you only appear to ask about extracting the sub-strings then that is all this will cover. Parsing ambiguous date strings to dates is left as a separate exercise to the reader.

Don't try to do it in one regular expression. Use one regular expression for each format you want to match and then use COALESCE to check them one-by-one.

SELECT column_name,
       COALESCE(
         -- YY-MM-DD or YYYY-MM-DD
         REGEXP_SUBSTR(
           column_name,
           '\d{2}\d{2}?([/.-])('
           || '(0?[13578]|1[02])\1(0[1-9]|[12]\d|3[01])'
           || '|(0?[469]|11)\1(0[1-9]|[12]\d|30)'
           || '|0?2\1(0[1-9]|1\d|2[0-9])'
           || ')'
         ),
         -- MM-DD-YY or MM-DD-YYYY
         REGEXP_SUBSTR(
           column_name,
           '(0?[1-9]|1[0-2])([/.-])(0?[1-9]|[12]\d|3[01])\2\d{2}\d{2}?'
         ),
         -- DD-MM-YY or DD-MM-YYYY
         REGEXP_SUBSTR(
           column_name,
           '(0?[1-9]|[12]\d|3[01])([/.-])(0?[1-9]|1[0-2])\2\d{2}\d{2}?'
         )
       ) AS date_str,
       expected
FROM   table_name

Notes:

  1. The output still gives ambiguous dates as 7-03-21 could be 0007-03-21, 2007-03-21, 0021-07-03, 2021-07-03, 0021-03-07, 2021-03-07 and you have no way of knowing which is correct - however, if you just want to extract the ambiguous date strings then the query above will do it.
  2. The query above does not check for leap years and the second two parts do not check that the months have the correct maximum days - adding those to the regular expression is left as an exercise to the reader.

Which, for the sample data:

CREATE TABLE table_name (column_name, expected) AS
SELECT '28/11/22 11-23333',            '28/11/22'   FROM DUAL UNION ALL
SELECT '11-23333 28/11/22',            '28/11/22'   FROM DUAL UNION ALL
SELECT 'something 20.02.2022 end',     '20.02.2022' FROM DUAL UNION ALL
SELECT '7-03-21 start',                '7-03-21'    FROM DUAL UNION ALL
SELECT 'no date here',                 NULL         FROM DUAL UNION ALL
SELECT 'date is 2023/11/12 something', '2023/11/12' FROM DUAL UNION ALL
SELECT 'prefix2023-11-12suffix',       '2023-11-12' FROM DUAL UNION ALL
SELECT '2023.11.12 at start',          '2023.11.12' FROM DUAL;

Outputs:

COLUMN_NAME DATE_STR EXPECTED
28/11/22 11-23333 28/11/22 28/11/22
11-23333 28/11/22 28/11/22 28/11/22
something 20.02.2022 end 20.02.2022 20.02.2022
7-03-21 start 7-03-21 7-03-21
no date here null null
date is 2023/11/12 something 2023/11/12 2023/11/12
prefix2023-11-12suffix 2023-11-12 2023-11-12
2023.11.12 at start 2023.11.12 2023.11.12

fiddle

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.