Is there someone to help me with the following:
I'm trying to find specific date and time strings in a text (to be used within VBA Word). Currently working with the following RegEx string:
(?:([0-9]{1,2})[ |-])?(?:(jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?(?: |-)?(?(3)(?: around | at | ))?(?:([0-9]{1,2}:[0-9]{1,2})?(?: uur| u|u)?)?
Tested output on following text:
- date with around time: 26 sep 2016 around 09:00u
- date with at time: 1 sep 2016 at 09:00 uur
- date and time u: 1 sep 2018 09:00 u
- time without date: 08:30 uur
- date with time u: 1 sep 2016 at 09:00u
- only time: 09:00
- only month: jan
- month and year: feb 2019
- only day: 02
- only day with '-': 2-
- day and month: 2 jan
- month year: jan 2018
- date with '-': 2-feb-2018 09:00
- other month: 01 sept 2016
- full month: 1 september 2018
- shortened year: jul '18
Rules:
- a date followed by time is valid
- a date followed by text 'around' or 'at', followed by time is valid
- a date without day number is valid
- a date without year is valid
- a date, month only is not valid
- a day, without month or year not valid
- a date may contain dashes '-'
- a year may be shortenend with ', like
jun '18 - month name can be short or long
- full match includes ' uur' or 'u' (to highlight the text in ms-Word)
- submatches text from capture are without prepending or trailing spaces
example at: [https://regex101.com/r/6CFgBP/1/]
Expected output (when using in VBA Word): An regex Matches collection object in which each Match.SubMatches contains the individual items d, m, y, hh:mm from the capture groups in the regex search string. So for example 1: the Submatches (or capture groups) contains values: '26' ','sep','2016','09:00'
The RegEx works fine, but some false-positives need to be excluded:
- In case there is a day without month/year, should be excluded from Regex (example 9 and 10)
- In case there is a month without day, should be excluded (example 7)
(I was trying with som lookahead and reference \1 and ?(1), but was not able to get it running properly...)
Any advice highly appreciated!