Convert DataFrame column type from string to datetime

Question

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?

Trenton McKinney · Accepted Answer · 2022-10-27 18:31:20Z

741

The easiest way is to use to_datetime:

df['col'] = pd.to_datetime(df['col'])

It also offers a dayfirst argument for European times (but beware this isn't strict).

Here it is in action:

In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0   2005-05-23 00:00:00
dtype: datetime64[ns]

You can pass a specific format:

In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0   2005-05-23
dtype: datetime64[ns]

edited Oct 27, 2022 at 18:31

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Jun 16, 2013 at 15:18

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sigurdb · Accepted Answer · 2023-11-28 16:23:29Z

79

If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime.

df['date'] = df['date'].astype('datetime64[ns]')

or use datetime64[D] if you want Day precision and not nanoseconds

print(type(df['date'].iloc[0]))

yields

<class 'pandas._libs.tslib.Timestamp'>

the same as when you use pandas.to_datetime

You can try it with other formats then '%Y-%m-%d' but at least this works.

edited Nov 28, 2023 at 16:23

answered Jun 26, 2017 at 14:35

sigurdb

1,49515 silver badges13 bronze badges

1 Comment

fantabolous Over a year ago

fyi when timezone is specified in the string it ignores it

campeterson · Accepted Answer · 2018-10-11 21:16:28Z

62

You can use the following if you want to specify tricky formats:

df['date_col'] =  pd.to_datetime(df['date_col'], format='%d/%m/%Y')

More details on format here:

edited Oct 11, 2018 at 21:16

campeterson

3,7492 gold badges28 silver badges28 bronze badges

answered May 2, 2018 at 8:14

Ekhtiar

1,10311 silver badges9 bronze badges

Comments

wjandrea · Accepted Answer · 2023-10-24 00:33:27Z

26

If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True to make life easier.

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

Source: pd.to_datetime

or if you want a customized approach:

def autoconvert_datetime(value):
    formats = ['%m/%d/%Y', '%m-%d-%y']  # formats to try
    result_format = '%d-%m-%Y'  # output format
    for dt_format in formats:
        try:
            dt_obj = datetime.strptime(value, dt_format)
            return dt_obj.strftime(result_format)
        except Exception as e:  # throws exception when format doesn't match
            pass
    return value  # let it be if it doesn't match

df['date'] = df['date'].apply(autoconvert_datetime)

edited Oct 24, 2023 at 0:33

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered Jul 28, 2019 at 1:04

otaku

1,0393 gold badges18 silver badges40 bronze badges

2 Comments

Asclepius Over a year ago

A customized approach can be used without resorting to .apply which has no fast cache, and will struggle when converting a billion values. An alternative, but not a great one, is

col = pd.concat([pd.to_datetime(col, errors='coerce', format=f) for f in formats], axis='columns').bfill(axis='columns').iloc[:, 0]

Asclepius Over a year ago

If you have a mixture of formats, you should not use infer_datetime_format=True as this assumes a single format. Just skip this argument. To understand why, try pd.to_datetime(pd.Series(['1/5/2015 8:08:00 AM', '1/4/2015 11:24:00 PM']), infer_datetime_format=True) with and without errors='coerce'. See this issue.

Trenton McKinney · Accepted Answer · 2023-05-12 19:11:34Z

Multiple datetime columns

If you want to convert multiple string columns to datetime, then using apply() would be useful.

df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)

You can pass parameters to to_datetime as kwargs.

df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")

Passing to apply, without specifying axis, still converts values vectorially for each column. apply is needed here because pd.to_datetime can only be called on a single column. If it has to be called on multiple columns, the options are either use an explicit for-loop, or pass it to apply. On the other hand, if you call pd.to_datetime using apply on a column (e.g. df['date'].apply(pd.to_datetime)), that would not be vectorized, and should be avoided.

Use `format=` to speed up

If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion. There's barely any difference if the column is only date, though. In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.

It turns out explicitly specifying the format is about 25x faster. The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.

The code used to produce the plot:

import perfplot
import random

mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
    labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
    n_range=[2**k for k in range(19)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}" 
                               for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

Scarlett · Accepted Answer · 2022-11-01 19:43:05Z

2

Try this solution:

Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'
Then run this code: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
Output: 2022–12–31 00:00:01

answered Nov 1, 2022 at 19:43

Scarlett

291 bronze badge

1 Comment

wjandrea Over a year ago

"Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'" - What does that have to do with the question?

Mainland · Accepted Answer · 2023-10-24 00:10:44Z

-2

print(df1.shape)
(638765, 95)

%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'])
473 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'], format='mixed')
688 ms ± 3.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'], format='%Y-%m-%d %H:%M:%S')
470 ms ± 7.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Oct 24, 2023 at 0:10

Mainland

4,7025 gold badges39 silver badges87 bronze badges

Collectives™ on Stack Overflow

Convert DataFrame column type from string to datetime

7 Answers 7

Comments

1 Comment

Comments

2 Comments

Multiple datetime columns

Use `format=` to speed up

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

1 Comment

Comments

2 Comments

Multiple datetime columns

Use format= to speed up

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Use `format=` to speed up