2

I have the following data:

dict1={"Code":[3,3,3,1,1,2,2,3,3,3],"Num":[10,10,5,5,5,5,10,5,25,25]}

df1=pd.DataFrame(dict1)

which results in:

   Code Num
0   3   10
1   3   10
2   3   5
3   1   5
4   1   5
5   2   5
6   2   10
7   3   5
8   3   25
9   3   25

I want to create a new column (End Balance) which value is based on the existing Code and Num columns.

If Code value is 1 then End Balance is equal to Num

If Code is 2 then End Balance value is the sum of Num values where Code is 2

If Code is 3 then End Balance value is the sum of Num values where Code is 3

I use iterrows and I have the following script:

mylist1=[]
mylist2=[]
for index, row in df1.iterrows():
    if row["Code"]==1:
        end_balance=row["Num"]  
    elif row["Code"]==2:
        mylist1.append(row["Num"])
        end_balance=sum(mylist1) 
    elif row["Code"]==3:
        mylist2.append(row["Num"])
        end_balance=sum(mylist2)
    df1.loc[index,"End_Balance"]=end_balance

which output is

   Code Num End_Balance
0   3   10  10.00
1   3   10  20.00
2   3   5   25.00
3   1   5   5.00
4   1   5   5.00
5   2   5   5.00
6   2   10  15.00
7   3   5   30.00
8   3   25  55.00
9   3   25  80.00

The problem I have with this output is that at the second subset where Code = 3 the End_Balance column starts summation taking into account the first subset where Code is 3. You can see that easily. I want somehow mylist2 in the script to be erased after the first subset of Code=3 and when a new subset with Code = 3 comes the summation in column End_Balance should start over. Expected output is:

   Code Num End_Balance
0   3   10  10.00
1   3   10  20.00
2   3   5   25.00
3   1   5   5.00
4   1   5   5.00
5   2   5   5.00
6   2   10  15.00
7   3   5   5.00
8   3   25  30.00
9   3   25  55.00

May your suggestions follow the same logic - using iterrows. I know that probably with a groupby I can do what I want but here I need a solution with iterrows.

3 Answers 3

3

You can also use np.select:

In [2062]: import numpy as np

In [2063]: conditions = [df1.Code.eq(1), df1.Code.eq(2) | df1.Code.eq(3)]
In [2061]: choices = [df1.Num, df1.groupby((df1.Code != df1.Code.shift()).cumsum())['Num'].cumsum()]

In [2065]: df1['End_Balance'] = np.select(conditions, choices)

In [2066]: df1
Out[2066]: 
    Code  Num  End_Balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55
Sign up to request clarification or add additional context in comments.

Comments

3

IIUC, np.where + GroupBy.cumsum

import numpy as np
blocks = df['Code'].ne(df['Code'].shift()).cumsum()
df['End_balance'] = np.where(df['Code'].eq(1), df['Num'], df.groupby(blocks)['Num'].cumsum())

print(df)

   Code  Num  End_balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55

Or Series.where:

df['End_balance'] = df['Num'].where(df['Code'].eq(1),
                                    df.groupby(blocks)['Num'].cumsum())

Comments

2
df1['End_Balance'] = np.where(df1.Code == 1, df1.Num, df1.groupby((df1.Code != df1.Code.shift(1)).cumsum())['Num'].transform('cumsum') )
print(df1)

Prints:

   Code  Num  End_Balance
0     3   10           10
1     3   10           20
2     3    5           25
3     1    5            5
4     1    5            5
5     2    5            5
6     2   10           15
7     3    5            5
8     3   25           30
9     3   25           55

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.