0

I would like to create a column, in this given example 'amount', within a Pandas dataframe 'df' where the value of each row is based on its previous rows as well as the value from another column 'id'. Example, if 'id' already has the value 30 assigned to it in the 'amount' column, then 0 else 30.

The expected outcome shown below:

id  amount
a   30
b   30
a   0
a   0
c   30
a   0
c   0
b   0
b   0
a   0
a   0

I thought I could accomplish this through some combination of groupby and lambda, but sadly I've repeatedly hit a wall.

What I tried out was:

df['amount'] = df.apply(
    lambda x: 30 if df.groupby('id')['amount'].cumsum()<30
        else 0)

This gives me the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I apologize in advance if the solution is obvious, but unfortunately, I haven't been able to find anything so far that would solve this.

2
  • df['amount'] = np.where(df.groupby('id')['amount'].cumsum() < 30, 30, 0) Commented Sep 2, 2022 at 17:26
  • Thanks, Quang. I tried your approach and it, unfortunately, populates the value '30' for every single row, regardless if it already exists for an 'id'. Commented Sep 2, 2022 at 19:50

3 Answers 3

1

You can use an alternative column as such:

import numpy as np    
df1["pastcol"]=[np.nan]+list(df1["amount"])[:-1]

Output:

   id  amount  pastcol
0   a      30      NaN
1   b      30     30.0
2   a       0     30.0
3   a       0      0.0
4   c      30      0.0
5   a       0     30.0
6   c       0      0.0
7   b       0      0.0
8   b       0      0.0
9   a       0      0.0
10  a       0      0.0
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, Francisco! I'm trying to avoid using a separate column for this. The idea would be to create the 'amount' column from scratch without any values at first. Hope this makes sense.
0

I thankfully was able to answer my own question. For anyone who is interested, I was successful with the following approach:

df['amount'] = df['amount'].where(df.groupby('id')['amount'].shift().cumsum() < 30, 30)

Thanks to everyone who shared their ideas!

Comments

0

Please try this code:

# Import Moduel
import pandas as pd
import numpy as np

# Data Preparation and Preprocess
df = pd.DataFrame({'id':['a','b','a','a','c','a','c','b','b','a','a'],
'amount':[30,30,0,0,30,0,0,0,0,0,0]}
)
df['Orig_Index'] = df.reset_index().index
df['Dup_Seq'] = df.groupby(['id']).cumcount()+1
df_required = df.loc[df['Dup_Seq']==1].replace()
df_final = pd.merge(df, df_required[['Orig_Index','Dup_Seq']], left_on='Orig_Index', right_on='Orig_Index', how='left')
df_final['amount_v2'] = np.where(df_final['Dup_Seq_y'] ==1 , 30,0)
df_final.drop(['amount','Orig_Index','Dup_Seq_x','Dup_Seq_y'],axis=1,inplace=True)
df_final.rename(columns = {'amount_v2':'amount'},inplace=True)

# Data Display
df_final

1 Comment

I appreciate it, Leon! I was able to solve my problem with the following line of code: df['amount'] = df['amount'].where(df.groupby('id')['amount'].shift().cumsum() < 30, 30)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.