This should have been a regular question.
Extracting the sub-string that matches a pattern and converting that sub-string to a date in the expected format are two separate problems. The first, extracting sub-strings, is solvable. The second, converting the extracted values to dates, is ambiguous as some strings will match multiple date formats.
Since you only appear to ask about extracting the sub-strings then that is all this will cover. Parsing ambiguous date strings to dates is left as a separate exercise to the reader.
Don't try to do it in one regular expression. Use one regular expression for each format you want to match and then use COALESCE to check them one-by-one.
SELECT column_name,
COALESCE(
-- YY-MM-DD or YYYY-MM-DD
REGEXP_SUBSTR(
column_name,
'\d{2}\d{2}?([/.-])('
|| '(0?[13578]|1[02])\1(0[1-9]|[12]\d|3[01])'
|| '|(0?[469]|11)\1(0[1-9]|[12]\d|30)'
|| '|0?2\1(0[1-9]|1\d|2[0-9])'
|| ')'
),
-- MM-DD-YY or MM-DD-YYYY
REGEXP_SUBSTR(
column_name,
'(0?[1-9]|1[0-2])([/.-])(0?[1-9]|[12]\d|3[01])\2\d{2}\d{2}?'
),
-- DD-MM-YY or DD-MM-YYYY
REGEXP_SUBSTR(
column_name,
'(0?[1-9]|[12]\d|3[01])([/.-])(0?[1-9]|1[0-2])\2\d{2}\d{2}?'
)
) AS date_str,
expected
FROM table_name
Notes:
- The output still gives ambiguous dates as
7-03-21 could be 0007-03-21, 2007-03-21, 0021-07-03, 2021-07-03, 0021-03-07, 2021-03-07 and you have no way of knowing which is correct - however, if you just want to extract the ambiguous date strings then the query above will do it.
- The query above does not check for leap years and the second two parts do not check that the months have the correct maximum days - adding those to the regular expression is left as an exercise to the reader.
Which, for the sample data:
CREATE TABLE table_name (column_name, expected) AS
SELECT '28/11/22 11-23333', '28/11/22' FROM DUAL UNION ALL
SELECT '11-23333 28/11/22', '28/11/22' FROM DUAL UNION ALL
SELECT 'something 20.02.2022 end', '20.02.2022' FROM DUAL UNION ALL
SELECT '7-03-21 start', '7-03-21' FROM DUAL UNION ALL
SELECT 'no date here', NULL FROM DUAL UNION ALL
SELECT 'date is 2023/11/12 something', '2023/11/12' FROM DUAL UNION ALL
SELECT 'prefix2023-11-12suffix', '2023-11-12' FROM DUAL UNION ALL
SELECT '2023.11.12 at start', '2023.11.12' FROM DUAL;
Outputs:
| COLUMN_NAME |
DATE_STR |
EXPECTED |
| 28/11/22 11-23333 |
28/11/22 |
28/11/22 |
| 11-23333 28/11/22 |
28/11/22 |
28/11/22 |
| something 20.02.2022 end |
20.02.2022 |
20.02.2022 |
| 7-03-21 start |
7-03-21 |
7-03-21 |
| no date here |
null |
null |
| date is 2023/11/12 something |
2023/11/12 |
2023/11/12 |
| prefix2023-11-12suffix |
2023-11-12 |
2023-11-12 |
| 2023.11.12 at start |
2023.11.12 |
2023.11.12 |
fiddle