Create new column in dataframe based on conditions in existing columns

Question

I have the following data:

dict1={"Code":[3,3,3,1,1,2,2,3,3,3],"Num":[10,10,5,5,5,5,10,5,25,25]}

df1=pd.DataFrame(dict1)

which results in:

I want to create a new column (End Balance) which value is based on the existing Code and Num columns.

If Code value is 1 then End Balance is equal to Num

If Code is 2 then End Balance value is the sum of Num values where Code is 2

If Code is 3 then End Balance value is the sum of Num values where Code is 3

I use iterrows and I have the following script:

mylist1=[]
mylist2=[]
for index, row in df1.iterrows():
    if row["Code"]==1:
        end_balance=row["Num"]  
    elif row["Code"]==2:
        mylist1.append(row["Num"])
        end_balance=sum(mylist1) 
    elif row["Code"]==3:
        mylist2.append(row["Num"])
        end_balance=sum(mylist2)
    df1.loc[index,"End_Balance"]=end_balance

which output is

   Code Num End_Balance
0   3   10  10.00
1   3   10  20.00
2   3   5   25.00
3   1   5   5.00
4   1   5   5.00
5   2   5   5.00
6   2   10  15.00
7   3   5   30.00
8   3   25  55.00
9   3   25  80.00

The problem I have with this output is that at the second subset where Code = 3 the End_Balance column starts summation taking into account the first subset where Code is 3. You can see that easily. I want somehow mylist2 in the script to be erased after the first subset of Code=3 and when a new subset with Code = 3 comes the summation in column End_Balance should start over. Expected output is:

   Code Num End_Balance
0   3   10  10.00
1   3   10  20.00
2   3   5   25.00
3   1   5   5.00
4   1   5   5.00
5   2   5   5.00
6   2   10  15.00
7   3   5   5.00
8   3   25  30.00
9   3   25  55.00

May your suggestions follow the same logic - using iterrows. I know that probably with a groupby I can do what I want but here I need a solution with iterrows.

Mayank Porwal · Accepted Answer · 2020-10-25 19:37:53Z

3

You can also use np.select:

In [2062]: import numpy as np

In [2063]: conditions = [df1.Code.eq(1), df1.Code.eq(2) | df1.Code.eq(3)]
In [2061]: choices = [df1.Num, df1.groupby((df1.Code != df1.Code.shift()).cumsum())['Num'].cumsum()]

In [2065]: df1['End_Balance'] = np.select(conditions, choices)

In [2066]: df1
Out[2066]: 
    Code  Num  End_Balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55

edited Oct 25, 2020 at 19:37

answered Oct 25, 2020 at 19:25

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ansev · Accepted Answer · 2020-10-25 19:30:46Z

3

IIUC, np.where + GroupBy.cumsum

import numpy as np
blocks = df['Code'].ne(df['Code'].shift()).cumsum()
df['End_balance'] = np.where(df['Code'].eq(1), df['Num'], df.groupby(blocks)['Num'].cumsum())

print(df)

   Code  Num  End_balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55

Or Series.where:

df['End_balance'] = df['Num'].where(df['Code'].eq(1),
                                    df.groupby(blocks)['Num'].cumsum())

edited Oct 25, 2020 at 19:30

answered Oct 25, 2020 at 19:17

ansev

31k5 gold badges21 silver badges33 bronze badges

Comments

Andrej Kesely · Accepted Answer · 2020-10-25 19:25:42Z

2

df1['End_Balance'] = np.where(df1.Code == 1, df1.Num, df1.groupby((df1.Code != df1.Code.shift(1)).cumsum())['Num'].transform('cumsum') )
print(df1)

Prints:

   Code  Num  End_Balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55

answered Oct 25, 2020 at 19:25

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

Create new column in dataframe based on conditions in existing columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related