1

I have the following dataframe:

df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})

KEY DATE        ENDNO ITEM
1   2020-01-01  1000  PAPERCLIPS
1   2020-01-01  1000  BINDERS   
1   2020-01-01  1000  STAPLES   
1   2020-01-08  2000  PAPERCLIPS
1   2020-01-08  2000  BINDERS   
1   2020-01-08  2000  STAPLES
1   2020-01-08  2000  TAPE
2   2020-02-01  400   PENCILS   
2   2020-02-01  400   PENS      

I need to add a new column called "STARTNO" and populate it based on multiple conditions:

if KEY <> KEY of row above, STARTNO = 0
else
   (if DATE = DATE of row above, STARTNO = STARTNO of row above
    else STARTNO = ENDNO of row above)

It should end up looking something like this:

KEY DATE        STARTNO ENDNO ITEM
1   2020-01-01  0       1000  PAPERCLIPS
1   2020-01-01  0       1000  BINDERS   
1   2020-01-01  0       1000  STAPLES   
1   2020-01-08  1000    2000  PAPERCLIPS
1   2020-01-08  1000    2000  BINDERS   
1   2020-01-08  1000    2000  STAPLES
1   2020-01-08  1000    2000  TAPE   
2   2020-02-01  0       400   PENCILS   
2   2020-02-01  0       400   PENS      

If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.

Would someone please point me in the right direction?

Thanks!

ETA:

Quang Hoang's answer almost got me what I needed. I realized I missed one aspect of my initial list.

I've added a new item called "TAPE" and updated the dataframe script above.

Applying the groupby clause works well for all items except TAPE. With TAPE, it puts the STARTNO back at 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE. If I change the code to:

df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)

it starts the STARTNO back at 0 whenever the date changes, which is incorrect.

How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?

2
  • If ITEM is not important, the should ENDNO be the same for same DATE and KEY? Commented Feb 9, 2021 at 17:07
  • Yes, that's correct. Commented Feb 9, 2021 at 17:51

1 Answer 1

4

I think this is groupby().shift():

df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)

Output:

  KEY        DATE ENDNO        ITEM STARTNO
0   1  2020-01-01  1000  PAPERCLIPS       0
1   1  2020-01-01  1000     BINDERS       0
2   1  2020-01-01  1000     STAPLES       0
3   1  2020-01-08  2000  PAPERCLIPS    1000
4   1  2020-01-08  2000     BINDERS    1000
5   1  2020-01-08  2000     STAPLES    1000
6   2  2020-02-01   400     PENCILS       0
7   2  2020-02-01   400        PENS       0
Sign up to request clarification or add additional context in comments.

1 Comment

Quang, your answer almost got me what I needed; however, I realized I missed one aspect in my dataframe. I've updated the question and added a new row to the dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.