2

R has a very nice workflow that allows user to set the date/month/year order but otherwise handles messiness of user-input date strings:

date_str = c('05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022')
lubridate::parse_date_time(date_str, orders = 'dmy')
#> [1] "2022-03-05 UTC" "2022-03-14 UTC" "2022-03-14 UTC" "2022-03-14 UTC"

The closest I've found in Python is:

from dateparser import parse
date_str = ['05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022']
list(map(lambda l: parse(l, date_formats = ['dmy']), date_str))
[datetime.datetime(2022, 5, 3, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0)]

which handles messiness but transposes day/month in the first observation, I think because date_formats prioritises explicitly defined formats and otherwise reverts to the (silly) default US month-day-year format?

Is there a nice implementation in Python that can be relied upon to handle messiness as well as assume a date/month ordering?

1 Answer 1

2

Well, if dateparser otherwise does what you like, you can gently wrap it to prioritize the format you like:

import dateparser
import datetime
import re

dmy_re = re.compile(r"^(?P<day>\d+)/(?P<month>\d+)/(?P<year>\d+)$")


def parse_with_dmy_priority(ds):
    dmy_match = dmy_re.match(ds)
    if dmy_match:
        return datetime.datetime(**{k: int(v) for (k, v) in dmy_match.groupdict().items()})
    return dateparser.parse(ds)


in_data = ['05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022']
print([parse_with_dmy_priority(d) for d in in_data])
[
  datetime.datetime(2022, 3, 5, 0, 0), 
  datetime.datetime(2022, 3, 14, 0, 0),
  datetime.datetime(2022, 3, 14, 0, 0), 
  datetime.datetime(2022, 3, 14, 0, 0),
]

This generalizes nicely too:

def parse_date(ds, regexps=()):
    for regexp in regexps:
        match = regexp.match(ds)
        if match:
            return datetime.datetime(**{k: int(v) for (k, v) in match.groupdict().items()})
    return dateparser.parse(ds)


print([parse_date(d, regexps=[dmy_re]) for d in in_data])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.