0

Something was active for some period of time (e.g. from 01-01-2022 to 01-02-2022). I have a list of those items, every one was active for a different period of time. I want to count how many items were active every day. I know only first and last day for each of them.

df = {'name': ['a','b','c','d','e','f'], 'first_day': ['01-01-2022', '15-02-2022', '20-01-2022', '5-02-2022', '19-02-2022', '21-02-2022'], 'last_day':['09-01-2022', '20-02-2022', '27-02-2022', '27-02-2022','19-02-2022', '28-02-2022']}

Result: on 01-01-2022 active was only 1 item (a), on 02-01-2022 = 1 (a), [...], on 26-02-2022 = 3 (c, d, f), [...], on 28-02-2022 = 1 (f). I care only about the count of items and I have no idea how to do it.

2 Answers 2

0

Assuming this is for pandas dataframe: create a list of days for each element, explode list into multiple rows and finally use groupby:

df = pd.DataFrame({'name': ['a', 'b', 'c', 'd', 'e', 'f'],
                   'first_day': ['01-01-2022', '15-02-2022', '20-01-2022', '5-02-2022', '19-02-2022', '21-02-2022'],
                   'last_day': ['09-01-2022', '20-02-2022', '27-02-2022', '27-02-2022', '19-02-2022',
                                '28-02-2022']})
df['first'] = pd.to_datetime(df['first_day'], format="%d-%m-%Y")
df['last'] = pd.to_datetime(df['last_day'], format="%d-%m-%Y")
df['days'] = df.apply(lambda x: pd.date_range(x['first'], x['last'], freq='D').tolist(), axis=1)
result = df.explode('days').groupby('days').size()
Sign up to request clarification or add additional context in comments.

Comments

0

You can use datetime, for example:

from datetime import datetime

df = {'name': ['a', 'b', 'c', 'd', 'e', 'f'],
      'first_day': ['01-01-2022', '15-02-2022', '20-01-2022', '5-02-2022', '19-02-2022', '21-02-2022'],
      'last_day': ['09-01-2022', '20-02-2022', '27-02-2022', '27-02-2022', '19-02-2022', '28-02-2022']}

first_day = [datetime.strptime(date, '%d-%m-%Y') for date in df['first_day']]
last_day = [datetime.strptime(date, '%d-%m-%Y') for date in df['last_day']]
time_delta = [last-first for first, last in zip(first_day, last_day)]
active_days = [delta.days for delta in time_delta]

print(active_days) # [8, 5, 38, 22, 0, 7]

EDIT: if you are producing the dictionary df, format your date string as '2022-07-28' so you can use numpy which is more efficient for big lists:

import numpy as np

df = {'name': ['a', 'b', 'c', 'd', 'e', 'f'],
      'first_day': ['2022-01-01', '2022-02-15', '2022-01-20', '2022-02-05', '2022-02-19', '2022-02-21'],
      'last_day': ['2022-01-09', '2022-02-20', '2022-02-27', '2022-02-27', '2022-02-19', '2022-02-28']}

first_day = np.array(df['first_day'], dtype='datetime64[D]')
last_day = np.array(df['last_day'], dtype='datetime64[D]')
active_days = last_day-first_day 

print(active_days) # [ 8  5 38 22  0  7]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.