0

I have several strings, and have identified some formats of date on them, and would like to recognize date on each string

an_2011_02_12_azar.mp3 ->this is yyyy_mm_dd
20121112_Marcel.mp3    ->this is yyyymmdd
cdani_270607.mp3       ->this is ddmmyy
lica_07_03_15.mp3      ->this is dd_mm_yy

to do so I have:

foo = """
an_2011_02_12_azar.mp3
20121112_Marcel.mp3   
cdani_270607.mp3     
lica_07_03_15.mp3  
"""
try:
    lines = foo.split('\n')
except AttributeError:
    lines = x
for line in lines:
     print(line)
     #deals with 2011_02_12 format
     match = re.search(r'\d{4}_\d{2}_\d{2}', line)
     date = datetime.datetime.strptime(match.group(), '%Y_%m_%d').date()
     print(date)

How to apply several regular expressions so it can recognize dates?

6
  • 1
    Loop over the patterns? Commented May 16, 2015 at 17:05
  • When looping and string does not have %Y_%m_%d I geterror date = datetime.datetime.strptime(match.group(), '%Y_%m_%d').date() AttributeError: 'NoneType' object has no attribute 'group' Commented May 16, 2015 at 17:06
  • So either check for None or handle the error and move on, and break/return when one parses successfully. Commented May 16, 2015 at 17:07
  • Is there a way to apply OR on regular expression? Commented May 16, 2015 at 17:07
  • 1
    No, the pipe goes in the pattern. Try reading the Python re docs, or just use a loop outside of regex. Commented May 16, 2015 at 17:10

1 Answer 1

1

If you remove the underscores:

datestr = line.replace('_', '')

then there would be only two date formats to deal with: yyyymmdd or ddmmyy. Furthermore, every date string would consist of 6 to 8 digits which you could find using the regex pattern r'\d{8}|\d{6}':

datestr = re.search(r'\d{8}|\d{6}', datestr).group()

The datestr could then be parsed with either

date = DT.datetime.strptime(datestr, '%d%m%y')

or

date = DT.datetime.strptime(datestr, '%Y%m%d')

The pattern r'\d{8}|\d{6}' would also capture some possibly non-date-like strings, such digits which represent invalid dates. We could deal with those cases by using try..except to catch ValueErrors.


import re
import datetime as DT

foo = """\
an_2011_02_12_azar.mp3
20121112_Marcel.mp3   
cdani_270607.mp3     
lica_07_03_15.mp3  
an_2011_13_12_azar.mp3
"""

for line in foo.splitlines():
    datestr = line.replace('_', '')
    datestr = re.search(r'\d{8}|\d{6}', datestr).group()
    try:
        # %y matches 2-digit years
        date = DT.datetime.strptime(datestr, '%d%m%y')
    except ValueError:
        try:
            # %Y matches 4-digit years
            date = DT.datetime.strptime(datestr, '%Y%m%d')
        except ValueError:
            # handle the error case
            date = None
    print('{:23} --> {}'.format(line, date))

yields

an_2011_02_12_azar.mp3  --> 2011-02-12 00:00:00
20121112_Marcel.mp3     --> 2012-11-12 00:00:00
cdani_270607.mp3        --> 2007-06-27 00:00:00
lica_07_03_15.mp3       --> 2015-03-07 00:00:00
an_2011_13_12_azar.mp3  --> None
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.