4

I have a dataset with non-static date structure

Such as

Fri, 13 Apr 2018 13:13:12 +0000 (UTC)
Mon, 26 Mar 2018 06:32:59 +0100
Tue, 05 Dec 2017 11:03:34 GMT
08 Dec 2016 12:00:24

How to get the day, (hour+offset) and minute from a string like that without manual code using regex.

3
  • 1
    What is your desired output format? and how do you differentiate b/w manual code and regex? Commented Feb 20, 2019 at 9:33
  • i just wanna extract and change to categorical (for ml) @user5173426 Commented Feb 20, 2019 at 9:35
  • Check strptime docs and this question. In your case you would probably need to implement additional fallback strategy to switch between different format types (e.g. for format in formats + strptime in try ... except) Commented Feb 20, 2019 at 9:36

1 Answer 1

11

Using timestring:

import timestring

dt_1 = "Fri, 13 Apr 2018 13:13:12 +0000 (UTC)"
dt_2 = "Mon, 26 Mar 2018 06:32:59 +0100"
dt_3 = "Tue, 05 Dec 2017 11:03:34 GMT"
dt_4 = "08 Dec 2016 12:00:24"

print(timestring.Date(dt_1))
print(timestring.Date(dt_2))
print(timestring.Date(dt_3))
print(timestring.Date(dt_4))

EDIT:

While I was at it, here is another cooler method:

Using dparser:

import dateutil.parser as dparser

dt_1 = "Fri, 13 Apr 2018 13:13:12 +0000 (UTC)"
dt_2 = "Mon, 26 Mar 2018 06:32:59 +0100"
dt_3 = "Tue, 05 Dec 2017 11:03:34 GMT"
dt_4 = "08 Dec 2016 12:00:24"


print(dparser.parse(dt_1,fuzzy=True))
print(dparser.parse(dt_2,fuzzy=True))
print(dparser.parse(dt_3,fuzzy=True))
print(dparser.parse(dt_4,fuzzy=True))

OUTPUT:

2018-04-13 13:13:12+00:00
2018-03-26 06:32:59+01:00
2017-12-05 11:03:34+00:00
2016-12-08 12:00:24

EDIT 2:

Why is dparser cooler?

Invalid dates raise a ValueError:

invalid_dt = "Fri, 35 Apr 2018 13:13:12 +0000 (UTC)"
print(dparser.parse(invalid_dt,fuzzy=True))

OUTPUT:

ValueError: day is out of range for month

EDIT 3:

To get the day, month, year, hour, minute or second:

print(dparser.parse(dt_1,fuzzy=True).day)     # 13
print(dparser.parse(dt_2,fuzzy=True).month)   # 3
print(dparser.parse(dt_3,fuzzy=True).year)    # 2017
print(dparser.parse(dt_4,fuzzy=True).hour)    # 12
print(dparser.parse(dt_4,fuzzy=True).minute)  # 0
print(dparser.parse(dt_4,fuzzy=True).second)  # 24

EDIT 4:

If you want to get the name of the Day:

print(datetime.date(dparser.parse(dt_1,fuzzy=True)).strftime("%a"))  # Fri
Sign up to request clarification or add additional context in comments.

8 Comments

It seems that the output are not the good ones. For example: Fri 13 Apr gives 2019-02-15.
thanks for the help, that return a good string. any idea about the day ?
@BaltschunAli Ofcourse, just use the day attribute. like > print(dparser.parse(dt_1,fuzzy=True).day) returns 13
sorry, i mean like mon , fri etc
Sure, you can use print(datetime.date(dparser.parse(dt_1,fuzzy=True)).strftime("%A")) returns Friday or use %a if you want Fri
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.