127

How can I extract the date from a string like "monkey 2010-07-10 love banana"? Thanks!

2
  • 3
    Just a hint: it starts and ends with a digit. Let me think about that. Although, regex can be your friend there. Commented Jul 18, 2010 at 15:50
  • .isdigit() @HamishGrubijan is an implementation, though my answer below discusses this in detail with modules for ease. Commented Jun 2, 2021 at 18:38

8 Answers 8

201

Using python-dateutil:

In [1]: import dateutil.parser as dparser

In [18]: dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Out[18]: datetime.datetime(2010, 7, 10, 0, 0)

Invalid dates raise a ValueError:

In [19]: dparser.parse("monkey 2010-07-32 love banana",fuzzy=True)
# ValueError: day is out of range for month

It can recognize dates in many formats:

In [20]: dparser.parse("monkey 20/01/1980 love banana",fuzzy=True)
Out[20]: datetime.datetime(1980, 1, 20, 0, 0)

Note that it makes a guess if the date is ambiguous:

In [23]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True)
Out[23]: datetime.datetime(1980, 10, 1, 0, 0)

But the way it parses ambiguous dates is customizable:

In [21]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True, dayfirst=True)
Out[21]: datetime.datetime(1980, 1, 10, 0, 0)
Sign up to request clarification or add additional context in comments.

7 Comments

@Hamish: If there are two dates (as in the case of "monkey 10/01/1980 love 7/10/2010 banana"), it may raise a ValueError, or (as in the case of "monkey 10/01/1980 love 2010-07-10 banana") it may misinterpret the second date as denoting hours, minutes, seconds or timezone. fuzzy=True gives it license to guess.
@unutbu str = "By flufie · October 14, 2010 at 11:22 pm · 26 replies" By using dateutil i am getting "ValueError: hour must be in 0..23 "
what happens if there are more than 1 date in the text?
@alvas: The parse function may raise an exception (even if fuzzy=True), or with fuzzy=True, it may return the first date or a mish-mash composed of parts of both dates. So really, parse should only be called on a string containing one date.
@Kailegh: Yes, it would be possible to deduce the indices using fuzzy_with_tokens=True. If you'd like more clarification, please start a new question.
|
122

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

import re
from datetime import datetime

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

4 Comments

What if it is in European format, such as 20/01/1980 meaning "Jan 20 1980"? What if months/days/years fall outside of reasonable range?
@lunaryorn In the first statement does "re" refer to the string where we are seaching for our desired pattern?
@vishal.k It refers to the built-in re module, ie, import re.
In case someone else made same mistake: you need to from datetime import datetime instead of import datetime
41

For extracting the date from a string in Python; the best module available is the datefinder module.

You can use it in your Python project by following the easy steps given below.

Step 1: Install datefinder Package

pip install datefinder

Step 2: Use It In Your Project

import datefinder

input_string = "monkey 2010-07-10 love banana"
# a generator will be returned by the datefinder module. I'm typecasting it to a list. Please read the note of caution provided at the bottom.
matches = list(datefinder.find_dates(input_string))

if len(matches) > 0:
    # date returned will be a datetime.datetime object. here we are only using the first match.
    date = matches[0]
    print date
else:
    print 'No dates found'

note: if you are expecting a large number of matches; then typecasting to list won't be a recommended way as it will be having a big performance overhead.

4 Comments

I found that datefinder handed ambiguous date matching better than python-dateutil returning only two possible dates from a random medium.com blog post as opposed to five. Not sure how it handles different locales however...
This is pretty good, except it somehow doesnt work when there is a colon(:) before date string: string = "Assessment Date: 17-May-2017 at 13:31" list(datefinder.find_dates(string.lower())) #[] string = "Assessment Date 17-May-2017 at 13:31" list(datefinder.find_dates(string.lower())) #[datetime.datetime(2017, 5, 17, 13, 31)]
agree that datefinder is heaps better than dateparser for ambiguous text
I agree that it's much better than dateutil.parser.parse(text, fuzzy=True). Gonna use it in my infrastructure utills.
3

Using Pygrok, you can define abstracted extensions to the Regular Expression syntax.

The custom patterns can be included in your regex in the format %{PATTERN_NAME}.

You can also create a label for that pattern, by separating with a colon: %s{PATTERN_NAME:matched_string}. If the pattern matches, the value will be returned as part of the resulting dictionary (e.g. result.get('matched_string'))

For example:

from pygrok import Grok

input_string = 'monkey 2010-07-10 love banana'
date_pattern = '%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}'

grok = Grok(date_pattern)
print(grok.match(input_string))

The resulting value will be a dictionary:

{'month': '07', 'day': '10', 'year': '2010'}

If the date_pattern does not exist in the input_string, the return value will be None. By contrast, if your pattern does not have any labels, it will return an empty dictionary {}

References:

1 Comment

This lib ie very Python 2
3

Hands Down The Best Ways

There are two good modules on PyPI and GitHub, that make this task easier for us. Those are

  1. DATEFINDER Module, useful for finding dates in strings of text.

Installation pip install datefinder

EXAMPLE

import datefinder

input_string = "monkey 2010-07-10 love banana"
# a generator will be returned by the datefinder module. I'm typecasting it to a list. Please read the note of caution provided at the bottom.
matches = list(datefinder.find_dates(input_string))

if len(matches) > 0:
    # date returned will be a datetime.datetime object. here we are only using the first match.
    date = matches[0]
    print date
else:
    print 'No dates found'

SOURCE: Finny Abraham

  1. DATERPARSER, extremely useful for scraping dates from an HTML file, in different lingual formats, supports Hijri and Jalali Calender as well. And supporters almost 200+ Languages in Different Formats

Features

Generic parsing of dates in over 200 language locales plus numerous formats in a language agnostic fashion. Generic parsing of relative dates like: '1 min ago', '2 weeks ago', '3 months, 1 week and 1 day ago', 'in 2 days', 'tomorrow'.

Advanced Features

Generic parsing of dates with time zones abbreviations or UTC offsets like: 'August 14, 2015 EST', 'July 4, 2013 PST', '21 July 2013 10:15 pm +0500'. Date lookup in longer texts. Support for non-Gregorian calendar systems. See Supported Calendars. Extensive test coverage.

SOURCE CODE [Example]

>>> parse('1 hour ago')
datetime.datetime(2015, 5, 31, 23, 0)
>>> parse('Il ya 2 heures')  # French (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)
>>> parse('1 anno 2 mesi')  # Italian (1 year 2 months)
datetime.datetime(2014, 4, 1, 0, 0)
>>> parse('yaklaşık 23 saat önce')  # Turkish (23 hours ago)
datetime.datetime(2015, 5, 31, 1, 0)
>>> parse('Hace una semana')  # Spanish (a week ago)
datetime.datetime(2015, 5, 25, 0, 0)
>>> parse('2小时前')  # Chinese (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)

2 Comments

It would be helpful if you provided links for the libraries you mentioned. At least for the second one.
1

You could also try the dateparser module, which may be slower than datefinder on free text but which should cover more potential cases and date formats, as well as a significant number of languages.

Comments

0

HARD MODE:

If your dates are not separated by whitespace from surrounding text, combining datefinder with wordninja will solve this problem:

>>>import datefinder
>>>import wordninja
>>>example = '04.02.22ILeftMyHeartInSF ---> I Left My Heart In Sf - blah blah blah'
>>>list(datefinder.find_dates(' '.join(wordninja.split(example))))
[datetime.datetime(2022, 4, 22, 0, 0)]

Well sorta. That date was actually February 2004 not April 2022, but any tool would have to guess.

Just to be clear, this is what wordninja does to squishedtogethertext:

>>>wordninja.split(example)
['04', '02', '22', 'I', 'Left', 'My', 'Heart', 'In', 'SF', 'I', 'Left', 'My', 'Heart', 'In', 'Sf', 'blah', 'blah', 'blah']

Comments

-9

If you know the position of the date object in the string (for example in a log file), you can use .split()[index] to extract the date without fully knowing the format.

For example:

>>> string = 'monkey 2010-07-10 love banana'
>>> date = string.split()[1]
>>> date
'2010-07-10'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.