I have this kind of Dataframe, I split it by unique value in "a" and "b". DF:
a b c d
0 red green 1 2
1 brown red 4 5
2 black grey 0 0
3 red blue 6 1
4 green blue 0 3
5 black brown 2 8
6 red grey 4 6
Code:
colors = pd.unique(df[['a', 'b']].values.ravel('K'))
Then I do calculation in a new column "e" for each DF:
df_list = []
for color in colors:
current_df = df[(df.a == color) | (df.b == color)].copy()
current_df["e"] = current_df.apply(
lambda x: (
current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
+ current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
)
/ (
current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
+ current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
),
axis=1,
)
df_list.append(current_df)
df_list
df_list
[ a b c d e
0 red green 1 2 5.0
1 brown red 4 5 5.0
3 red blue 6 1 4.0
6 red grey 4 6 NaN,
a b c d e
1 brown red 4 5 8.0
5 black brown 2 8 NaN,
a b c d e
2 black grey 0 0 2.0
5 black brown 2 8 NaN,
a b c d e
0 red green 1 2 0.0
4 green blue 0 3 NaN,
a b c d e
2 black grey 0 0 6.0
6 red grey 4 6 NaN,
a b c d e
3 red blue 6 1 3.0
4 green blue 0 3 NaN]
This code works well but when I am going to combine with dfconcat = pd.concat(df_list) all list of dataframes the result is not what I expect
dfconcat:
a b c d e
0 red green 1 2 5.0
1 brown red 4 5 5.0
3 red blue 6 1 4.0
6 red grey 4 6 NaN
1 brown red 4 5 8.0
5 black brown 2 8 NaN
2 black grey 0 0 2.0
5 black brown 2 8 NaN
0 red green 1 2 0.0
4 green blue 0 3 NaN
2 black grey 0 0 6.0
6 red grey 4 6 NaN
3 red blue 6 1 3.0
4 green blue 0 3 NaN
Expected Result:
a b c d e1 e2
0 red green 1 2 5.0 0.0
1 brown red 4 5 8.0 5.0
2 black grey 0 0 2.0 6.0
3 red blue 6 1 4.0 3.0
4 green blue 0 3 NaN NaN
5 black brown 2 8 NaN NaN
6 red grey 4 6 NaN NaN
It seems that duplicate the rows when pass pd.concat, How can I fix? I have to change the whole code or it possible by change pd.concat into merge for example?