1

Let's say I have the starting input table below.

Name _leavestartdate _leaveenddate
Jerry Rice 2023-12-26 2023-12-26
Pablo Picasso 2023-08-15 2023-08-21
Ray Ramon 2023-03-16 2023-03-17

The Python code to create it is:

import pandas as pd
df = pd.DataFrame({'name':["Jerry Rice","Pablo Picasso","Ray Ramon"],'_leavestartdate':["2023-12-26","2023-08-15","2023-03-16"],'_leaveenddate':["2023-12-26","2023-08-21","2023-03-17"]})

First, I want to get all dates between the _leavestartdate and _leaveenddate split based on Name column. The list of dates would populate under new column "Date". I also want to exclude any weekend dates in the newly reformatted table.

So my desired output table is shown below. Note that it only shows weekday dates and excludes the 2 dates (8/19/2023 and 8/20/2023) from Pablo Picasso that are weekend dates.

Name Date _leavestartdate _leaveenddate
Jerry Rice 2023-12-26 2023-12-26 2023-12-26
Pablo Picasso 2023-08-15 2023-08-15 2023-08-21
Pablo Picasso 2023-08-16 2023-08-15 2023-08-21
Pablo Picasso 2023-08-17 2023-08-15 2023-08-21
Pablo Picasso 2023-08-18 2023-08-15 2023-08-21
Pablo Picasso 2023-08-21 2023-08-15 2023-08-21
Ray Ramon 2023-03-16 2023-03-16 2023-03-17
Ray Ramon 2023-03-17 2023-03-16 2023-03-17

I looked at previous solutions in StackOverflow and tried the following code:

date = pd.DataFrame(pd.date_range(start=df.min()._leavestartdate, 
                     end=df.max()._leaveenddate), columns=['Date'])

df_final = pd.merge(left=date, right=df, left_on='Date', right_on='_leavestartdate', 
         how='outer').fillna(method='ffill')

But the solution produces the following wrong table because it's not accounting for the "Name" column:

enter image description here

How can I fix the code to get my desired output table?

4
  • One way of doing this is to do a cross join between date and df then filter to return only rows where Date is >=_leavestartdate and <=_leaveendddate. Not sure the exact syntax off the top of my head and it's not very efficient in the tables are large. Commented Nov 18, 2023 at 2:26
  • Filtering your result with this condition may be sufficient. Commented Nov 18, 2023 at 2:27
  • @David: Do you mean filter "Date" column so that it is within the startdate and enddate columns? But I think this would still have the weekend dates included ... which I probably could flag in a different column and drop them. Commented Nov 18, 2023 at 2:28
  • Yeah you can add an additional criteria to the filter, you should be able to use df['Date'].dt.day_name() to do the weekend filtering (using isin() to check if it's in Saturday/Sunday). Commented Nov 18, 2023 at 2:33

1 Answer 1

0
import pandas as pd

df['_leavestartdate'] = pd.to_datetime(df['_leavestartdate'])
df['_leaveenddate'] = pd.to_datetime(df['_leaveenddate'])


df_final = pd.concat([(pd.DataFrame({'Date':
                             pd.date_range(df.loc[i, '_leavestartdate'],
                                           df.loc[i, '_leaveenddate']),
                                     'name': df.loc[i, 'name']}))
                      for i in range(len(df))])

df_final = df_final[df_final['Date'].dt.dayofweek < 5]


print(df_final.merge(df, how='left', on='name'))

Here, in the list comprehension, the necessary rows of dataframes are generated and concatenate. Then the lines with Saturday and Sunday were removed and a left-hand merge was made by name.

You can even do without a merge and write _leavestartdate, _leaveenddate in df_final:

df_final = pd.concat([(pd.DataFrame({'Date':
                                         pd.date_range(df.loc[i, '_leavestartdate'],
                                                       df.loc[i, '_leaveenddate']),
                                     'name': df.loc[i, 'name'],
                                     '_leavestartdate': df.loc[i, '_leavestartdate'],
                                     '_leaveenddate': df.loc[i, '_leaveenddate']}))
                      for i in range(len(df))])

df_final = df_final[df_final['Date'].dt.dayofweek < 5]

Output

        Date           name _leavestartdate _leaveenddate
0 2023-12-26     Jerry Rice      2023-12-26    2023-12-26
1 2023-08-15  Pablo Picasso      2023-08-15    2023-08-21
2 2023-08-16  Pablo Picasso      2023-08-15    2023-08-21
3 2023-08-17  Pablo Picasso      2023-08-15    2023-08-21
4 2023-08-18  Pablo Picasso      2023-08-15    2023-08-21
5 2023-08-21  Pablo Picasso      2023-08-15    2023-08-21
6 2023-03-16      Ray Ramon      2023-03-16    2023-03-17
7 2023-03-17      Ray Ramon      2023-03-16    2023-03-17

If I didn’t take into account anything, please write.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.