1

The date variable in my data is in multiple formats like DD/MM/YYYY D/MM/YY DD/M/YYYY 12/8/2017 27/08/17 8/9/2017 10/9/2017 15/09/17..

I need to change these multiple formats into one single format like DD/MM/YYYY

Tried to create a parsing function

def parse_date(date):
if date == '':
    return None
else:
    return dt.strptime(date, '%d/%m/%y').date()

and when I apply this function to my dataset, it throws me the following error..

"ValueError Traceback (most recent call last) in () ----> 1 data.Date = data.Date.apply(parse_date)

Unconverted Data Remains Error ValueError: unconverted data remains: 17"

How can I solve the unconverted data remains error?

5
  • What is the date that causes that error? It could be the %y doesn't like the 2 digit year. Commented Jan 22, 2018 at 14:58
  • I would split the string on the slashes, and construct another string with the correct number of digits. Commented Jan 22, 2018 at 14:59
  • When I changed %y to %Y, I get the following error : ValueError: time data '13/08/17' does not match format '%d/%m/%Y' Commented Jan 22, 2018 at 15:00
  • @AndyG there are around 10000 records with dates ranging from 01/01/2000 - 12/01/2018 in multiple date formats.. Commented Jan 22, 2018 at 15:03
  • 10,000 isn't such a huge number. Commented Jan 22, 2018 at 15:10

3 Answers 3

5

You can use the dateutil module to do this

import dateutil.parser as dparser
a = ["12/8/2017", "27/08/17", "8/9/2017", "10/9/2017", "15/09/17"]

for i in a:
    print dparser.parse(i,fuzzy=True).date()

Result:

2017-12-08
2017-08-27
2017-08-09
2017-10-09
2017-09-15
Sign up to request clarification or add additional context in comments.

Comments

1

This is because %y expects a 4 digit number.

In order to cover multiple date formats, you can have a look at the dateparser library. (Docs)

Otherwise you will have to manually go through possible types or extend the dates yourself. If you are sure you only need to extend the year part you can do something like this before feeding the string to the parser:

date_parts = date.split('/')
if len(date_parts[2]) == 2:
    date_parts[2] = "20" + date_parts[2]
date = '/'.join(date_parts)

I think using the dateparser library is the way to go, as it is more extendible.

3 Comments

when I tried for the entire date values from the dataset, it shows an error. 'Series' object has no attribute 'split'
I tried as date= Data.Date.Split('/') where Data is my actual dataset..
You are getting this error because your Date is not a string as implied in the question but a series object. You will need to iterate over the Series and apply this element-wise.
0

A basic approach is to split the strings on the slashes, and then re-join them with the correct number of digits. A simple approach:

date = "12/8/2017"

parts = date.split("/")

print(parts) # ['12', '8', '2017']

if len(parts[0]) == 1:
    parts[0] = "0" + parts[0]
if len(parts[1]) == 1:
    parts[1] = "0" + parts[1]
if len(parts[2]) == 2:
    parts[2] = "20" + parts[2]
newDate = "/".join(parts)
# or 
newDate = parts[0] + "/" + parts[1] + "/" + parts[2]

print(newDate) # 12/08/2017

Then you have a consistent date format throughout. (An additional check is required if your dates extend into the last century.)

I would test this first, and consider the other answers' approaches if this is not performant.

1 Comment

This works perfectly for a single value. But when I tried for a series of date values from the dataset, it shows an error when I add entire dates records.. will work on it and will update! Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.