Below is code for the fastest function I know of for converting strings to dates, specialised for application on a series where the same dates are repeated - e.g. sub-day granularity financial time series data. If you are working with 1 minute bars and have individual date and time columns, that's a lot of repeated date strings.
def str_to_date(s):
"""
This is an extremely fast approach to datetime parsing.
For large data, the same dates are often repeated. Rather than
re-parse these, we store all unique dates, parse them, and
use a lookup to convert all dates.
"""
# Create a dictionary with unique dates as keys and their corresponding
# parsed datetime objects as values
dates = {date: pd.to_datetime(date,
format="%Y-%m-%d") for date in s.unique()}
# Map the original dates to their parsed values using the lookup dictionary
return s.map(dates).dt.date
Then if we re-run all the timings with the same large DataFrame used in this answer https://stackoverflow.com/a/66862336/3253628. First create the Dataframe:
import pandas as pd
df = pd.DataFrame({'a': ['2020-01-02', '2020-01-02'] * 5000})
I will pass the date format to each of the old approaches to make it fair to compare.
Time the first approach:
%%timeit
df['a'].apply(pd.to_datetime, format='"%Y-%m-%d").dt.date
For which we get:
621ms ± 4.12ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the second approach:
%%timeit
pd.to_datetime(df['a'], format='"%Y-%m-%d").dt.date
For which we get:
4.02 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And now the new approach using the function defined above:
%%timeit
str_to_date(df['a'])
For which we get:
2.66 ms ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
So that is a pretty good speed up.
The same logic can also be applied when there a lot of repeated time strings and you wish to convert them to time deltas.
def str_to_time(s):
"""
This is an extremely fast approach to datetime parsing.
For large data, the same times are often repeated. Rather than
re-parse these, we store all unique times, parse them, and
use a lookup to convert all dates.
"""
# Create a dictionary with unique times as keys and their corresponding
# parsed timedelta objects as values
times = {time: pd.to_timedelta(time) for time in s.unique()}
# Map the original times to their parsed values using the lookup dictionary
return s.map(times)