Pandas to_datetime parsing error: Unknown string format

Question

I have a column in my pandas dataframe:

Start Date
1/7/13
1/7/13
1/7/13
16/7/13
16/7/13

When I convert it into a date format, I am getting the following error:

data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
...
...
/Library/Python/2.7/site-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
    381                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    382             except (ValueError, TypeError):
--> 383                 raise e
    384 
    385     if arg is None:

ValueError: Unknown string format

What am I missing here?

You can pass errors='coerce' (or coerce=True in earlier versions), and then see which item is converted to NaT (and so could not be converted to a datetime) — joris
– joris, Commented Dec 29, 2015 at 8:49
The Type is object. By that I guess Its considered to be string? — pheno
– pheno, Commented Dec 29, 2015 at 11:19
This is probably indeed string (but can also be mixed). Did you try my suggestion? — joris
– joris, Commented Dec 29, 2015 at 16:00

Halee · Accepted Answer · 2018-07-19 19:10:30Z

22

I think the problem is in data - a problematic string exists. So you can try check length of the string in column Start Date:

import pandas as pd
import io

temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13"""

data = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=False)

#data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
print data

     Start Date
0        1/7/13
1         1/7/1
2  1/7/13 12 17
3       16/7/13
4       16/7/13

#check, if length is more as 7
print data[data['Start Date'].str.len() > 7]

     Start Date
2  1/7/13 12 17

Or you can try to find these problematic row different way e.g. read only part of the datetime and check parsing datetime:

#read first 3 rows
data= data.iloc[:3]

data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)

But this is only tips.

EDIT:

Thanks joris for suggestion add parameter errors ='coerce' to to_datetime:

temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13 12 04"""

data = pd.read_csv(io.StringIO(temp), sep=";")
#add parameter errors coerce
data['Start Date']= pd.to_datetime(data['Start Date'], dayfirst=True, errors='coerce')
print data

  Start Date
0 2013-07-01
1 2001-07-01
2        NaT
3 2013-07-16
4        NaT

#index of data with null - NaT to variable idx
idx = data[data['Start Date'].isnull()].index
print idx

Int64Index([2, 4], dtype='int64')

#read csv again
data = pd.read_csv(io.StringIO(temp), sep=";")

#find problematic rows, where datetime is not parsed
print data.iloc[idx]

      Start Date
2   1/7/13 12 17
4  16/7/13 12 04

edited Jul 19, 2018 at 19:10

Halee

5029 silver badges15 bronze badges

answered Dec 29, 2015 at 8:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

pheno Over a year ago

The data is from a CSV file. When I had a look at the data on Excel, it seems fine, like a regular date format. The length of the dates are not unusual.

pheno Over a year ago

Upon drilling down each record. I realised there were some encoding whitespaces issues, which needed trimming. Turns out the 'length' way of verification helped this case. Thanks

deadcode Over a year ago

how do you trim unusual whitespaces. I guess my whole column has an extra whitespace.

jezrael Over a year ago

@deadcode - use data['Start Date']= pd.to_datetime(data['Start Date'].str.strip(), dayfirst=True)

cottontail · Accepted Answer · 2023-04-17 17:39:11Z

Recent pandas versions (e.g. 1.5.0) throws a helpful error that shows where the unparsable string is in the column. That said, a common way to solve this error is to pass the correct format= especially if the dates have some "exotic" format (for a list of all possible formats, check https://strftime.org/) such as having underscores (_), em dash (—), unexpected white space as separators.

df = pd.DataFrame({'date': ['19_Mar_2017']})
pd.to_datetime(df['date'])                      # error
pd.to_datetime(df['date'], format='%d_%b_%Y')   # OK

df = pd.DataFrame({'date': ['19.Mar. 2017']})
pd.to_datetime(df['date'])                      # error
pd.to_datetime(df['date'], format='%d.%b. %Y')  # OK

If the datetimes have mixed formats, use the errors='coerce' argument to pass different formats and combine all using fillna(). For example,

df = pd.DataFrame({'date': ['19.Mar. 2017', '20.Mar.2017']})  # <--- has mixed format
first = pd.to_datetime(df['date'], format='%d.%b. %Y', errors='coerce')
second = pd.to_datetime(df['date'], format='%d.%b.%Y', errors='coerce')
df['date'] = first.fillna(second)

As an aside, if the data is imported from a file, pandas functions that read such files into dataframes have parse_dates= parameter that can be used to parse the dates.

# from csv
df = pd.read_csv('foo.csv', parse_dates=True)

# from Excel
df = pd.read_excel('foo.xlsx', parse_dates=True)

Collectives™ on Stack Overflow

Pandas to_datetime parsing error: Unknown string format

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related