Pandas: Create new column and populate with value from previous row based on conditions

Question

I have the following dataframe:

df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})

KEY DATE        ENDNO ITEM
1   2020-01-01  1000  PAPERCLIPS
1   2020-01-01  1000  BINDERS   
1   2020-01-01  1000  STAPLES   
1   2020-01-08  2000  PAPERCLIPS
1   2020-01-08  2000  BINDERS   
1   2020-01-08  2000  STAPLES
1   2020-01-08  2000  TAPE
2   2020-02-01  400   PENCILS   
2   2020-02-01  400   PENS

I need to add a new column called "STARTNO" and populate it based on multiple conditions:

if KEY <> KEY of row above, STARTNO = 0
else
   (if DATE = DATE of row above, STARTNO = STARTNO of row above
    else STARTNO = ENDNO of row above)

It should end up looking something like this:

KEY DATE        STARTNO ENDNO ITEM
1   2020-01-01  0       1000  PAPERCLIPS
1   2020-01-01  0       1000  BINDERS   
1   2020-01-01  0       1000  STAPLES   
1   2020-01-08  1000    2000  PAPERCLIPS
1   2020-01-08  1000    2000  BINDERS   
1   2020-01-08  1000    2000  STAPLES
1   2020-01-08  1000    2000  TAPE   
2   2020-02-01  0       400   PENCILS   
2   2020-02-01  0       400   PENS

If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.

Would someone please point me in the right direction?

Thanks!

ETA:

Quang Hoang's answer almost got me what I needed. I realized I missed one aspect of my initial list.

I've added a new item called "TAPE" and updated the dataframe script above.

Applying the groupby clause works well for all items except TAPE. With TAPE, it puts the STARTNO back at 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE. If I change the code to:

df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)

it starts the STARTNO back at 0 whenever the date changes, which is incorrect.

How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?

If ITEM is not important, the should ENDNO be the same for same DATE and KEY? — Quang Hoang
– Quang Hoang, Commented Feb 9, 2021 at 17:07

Quang Hoang · Accepted Answer · 2021-02-09 16:18:25Z

4

I think this is groupby().shift():

df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)

Output:

  KEY        DATE ENDNO        ITEM STARTNO
0   1  2020-01-01  1000  PAPERCLIPS       0
1   1  2020-01-01  1000     BINDERS       0
2   1  2020-01-01  1000     STAPLES       0
3   1  2020-01-08  2000  PAPERCLIPS    1000
4   1  2020-01-08  2000     BINDERS    1000
5   1  2020-01-08  2000     STAPLES    1000
6   2  2020-02-01   400     PENCILS       0
7   2  2020-02-01   400        PENS       0

answered Feb 9, 2021 at 16:18

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Heather Over a year ago

Quang, your answer almost got me what I needed; however, I realized I missed one aspect in my dataframe. I've updated the question and added a new row to the dataframe.

Collectives™ on Stack Overflow

Pandas: Create new column and populate with value from previous row based on conditions

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related