I have the following data:
dict1={"Code":[3,3,3,1,1,2,2,3,3,3],"Num":[10,10,5,5,5,5,10,5,25,25]}
df1=pd.DataFrame(dict1)
which results in:
Code Num
0 3 10
1 3 10
2 3 5
3 1 5
4 1 5
5 2 5
6 2 10
7 3 5
8 3 25
9 3 25
I want to create a new column (End Balance) which value is based on the existing Code and Num columns.
If Code value is 1 then End Balance is equal to Num
If Code is 2 then End Balance value is the sum of Num values where Code is 2
If Code is 3 then End Balance value is the sum of Num values where Code is 3
I use iterrows and I have the following script:
mylist1=[]
mylist2=[]
for index, row in df1.iterrows():
if row["Code"]==1:
end_balance=row["Num"]
elif row["Code"]==2:
mylist1.append(row["Num"])
end_balance=sum(mylist1)
elif row["Code"]==3:
mylist2.append(row["Num"])
end_balance=sum(mylist2)
df1.loc[index,"End_Balance"]=end_balance
which output is
Code Num End_Balance
0 3 10 10.00
1 3 10 20.00
2 3 5 25.00
3 1 5 5.00
4 1 5 5.00
5 2 5 5.00
6 2 10 15.00
7 3 5 30.00
8 3 25 55.00
9 3 25 80.00
The problem I have with this output is that at the second subset where Code = 3 the End_Balance column starts summation taking into account the first subset where Code is 3. You can see that easily. I want somehow mylist2 in the script to be erased after the first subset of Code=3 and when a new subset with Code = 3 comes the summation in column End_Balance should start over. Expected output is:
Code Num End_Balance
0 3 10 10.00
1 3 10 20.00
2 3 5 25.00
3 1 5 5.00
4 1 5 5.00
5 2 5 5.00
6 2 10 15.00
7 3 5 5.00
8 3 25 30.00
9 3 25 55.00
May your suggestions follow the same logic - using iterrows. I know that probably with a groupby I can do what I want but here I need a solution with iterrows.