Pandas: Create Column Values Based on Condition and Previous Rows Values

Question

I would like to create a column, in this given example 'amount', within a Pandas dataframe 'df' where the value of each row is based on its previous rows as well as the value from another column 'id'. Example, if 'id' already has the value 30 assigned to it in the 'amount' column, then 0 else 30.

The expected outcome shown below:

id  amount
a   30
b   30
a   0
a   0
c   30
a   0
c   0
b   0
b   0
a   0
a   0

I thought I could accomplish this through some combination of groupby and lambda, but sadly I've repeatedly hit a wall.

What I tried out was:

df['amount'] = df.apply(
    lambda x: 30 if df.groupby('id')['amount'].cumsum()<30
        else 0)

This gives me the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I apologize in advance if the solution is obvious, but unfortunately, I haven't been able to find anything so far that would solve this.

df['amount'] = np.where(df.groupby('id')['amount'].cumsum() < 30, 30, 0) — Quang Hoang
– Quang Hoang, Commented Sep 2, 2022 at 17:26
Thanks, Quang. I tried your approach and it, unfortunately, populates the value '30' for every single row, regardless if it already exists for an 'id'. — Francis
– Francis, Commented Sep 2, 2022 at 19:50

Francisco Gonzalvo · Accepted Answer · 2022-09-02 17:35:17Z

1

You can use an alternative column as such:

import numpy as np    
df1["pastcol"]=[np.nan]+list(df1["amount"])[:-1]

Output:

   id  amount  pastcol
0   a      30      NaN
1   b      30     30.0
2   a       0     30.0
3   a       0      0.0
4   c      30      0.0
5   a       0     30.0
6   c       0      0.0
7   b       0      0.0
8   b       0      0.0
9   a       0      0.0
10  a       0      0.0

answered Sep 2, 2022 at 17:35

Francisco Gonzalvo

472 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Francis Over a year ago

Thanks, Francisco! I'm trying to avoid using a separate column for this. The idea would be to create the 'amount' column from scratch without any values at first. Hope this makes sense.

Francis · Accepted Answer · 2022-09-02 22:13:15Z

0

I thankfully was able to answer my own question. For anyone who is interested, I was successful with the following approach:

df['amount'] = df['amount'].where(df.groupby('id')['amount'].shift().cumsum() < 30, 30)

Thanks to everyone who shared their ideas!

edited Sep 2, 2022 at 22:13

answered Sep 2, 2022 at 22:07

Francis

1292 silver badges12 bronze badges

Comments

tripleee · Accepted Answer · 2024-05-14 09:30:24Z

0

Please try this code:

# Import Moduel
import pandas as pd
import numpy as np

# Data Preparation and Preprocess
df = pd.DataFrame({'id':['a','b','a','a','c','a','c','b','b','a','a'],
'amount':[30,30,0,0,30,0,0,0,0,0,0]}
)
df['Orig_Index'] = df.reset_index().index
df['Dup_Seq'] = df.groupby(['id']).cumcount()+1
df_required = df.loc[df['Dup_Seq']==1].replace()
df_final = pd.merge(df, df_required[['Orig_Index','Dup_Seq']], left_on='Orig_Index', right_on='Orig_Index', how='left')
df_final['amount_v2'] = np.where(df_final['Dup_Seq_y'] ==1 , 30,0)
df_final.drop(['amount','Orig_Index','Dup_Seq_x','Dup_Seq_y'],axis=1,inplace=True)
df_final.rename(columns = {'amount_v2':'amount'},inplace=True)

# Data Display
df_final

edited May 14, 2024 at 9:30

tripleee

192k37 gold badges318 silver badges367 bronze badges

answered Sep 3, 2022 at 22:26

Leon Li

645 bronze badges

1 Comment

Francis Over a year ago

I appreciate it, Leon! I was able to solve my problem with the following line of code: df['amount'] = df['amount'].where(df.groupby('id')['amount'].shift().cumsum() < 30, 30)

Collectives™ on Stack Overflow

Pandas: Create Column Values Based on Condition and Previous Rows Values

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related