1

I have large pandas dataframe (more than 1000000 rows) where I need to get in the fastest way possible the number of business days (excluding weekends) between two rows (n and n+1) where each contains a column date. And each time, I need to store the duration (outcome) in the row n of the same dataframe in a column called 'duration'. The result is in seconds.

I am using the below code to do the calculation in the fastest way I know about (any better way is welcomed ;-) ).

    tmp_df['duration'] = 
    tmp_df['origin_tick_generation_time_stamp'].shift(-1) - tmp_df[
            'origin_tick_generation_time_stamp']

I would like to calculate the duration without weekends in my code. I read that np.busday_count(date1, date2) will do exactly that. But do not know how to use it in my case. Is there a way to do it?

Many thanks

2
  • You have 1 million dates? Commented Oct 3, 2019 at 18:25
  • I have much more. These are not just dates. each row has a date for an operation and I am computing the duration between 2 operations. An operation is an activity on a transaction in a bank. And I need the duration in business days. Commented Oct 3, 2019 at 22:56

1 Answer 1

1

Use pandas.Series.diff:

tmp_df['duration'] = tmp_df['origin_tick_generation_time_stamp'].diff(-1)*-1

or

tmp_df['duration'] = tmp_df['origin_tick_generation_time_stamp'].diff()*shift(-1)

it's something faster.

Example:

import numpy as np
df=pd.DataFrame()
df['a']=np.arange(1000000)
import time

start_time = time.time()
df['a'].shift(-1)-df['a']
elapsed_time = time.time() - start_time
print(elapsed_time)


#0.023838520050048828

start_time = time.time()
df['a'].diff(-1)*-1
elapsed_time = time.time() - start_time
print(elapsed_time)
#0.008615493774414062

start_time = time.time()
df['a'].diff().shift(-1)
elapsed_time = time.time() - start_time
print(elapsed_time)
#0.011868000030517578
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I will try it. Would you know how to get with this way of programming the differences between two rows as business days and not full calendar days?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.