Split DF based on multiple columns concatenate duplicate rows of DF list

Question

I have this kind of Dataframe, I split it by unique value in "a" and "b". DF:

    a     b     c   d
0   red   green 1   2
1   brown red   4   5
2   black grey  0   0
3   red   blue  6   1
4   green blue  0   3
5   black brown 2   8
6   red   grey  4   6

Code:

colors = pd.unique(df[['a', 'b']].values.ravel('K'))

Then I do calculation in a new column "e" for each DF:

df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy()
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
        ),
        axis=1,
    )
    df_list.append(current_df)

df_list


df_list
[       a      b  c  d    e
 0    red  green  1  2  5.0
 1  brown    red  4  5  5.0
 3    red   blue  6  1  4.0
 6    red   grey  4  6  NaN,
        a      b  c  d    e
 1  brown    red  4  5  8.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 2  black   grey  0  0  2.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 0    red  green  1  2  0.0
 4  green   blue  0  3  NaN,
        a     b  c  d    e
 2  black  grey  0  0  6.0
 6    red  grey  4  6  NaN,
        a     b  c  d    e
 3    red  blue  6  1  3.0
 4  green  blue  0  3  NaN]

This code works well but when I am going to combine with dfconcat = pd.concat(df_list) all list of dataframes the result is not what I expect

dfconcat:

    a     b     c   d   e
0   red   green 1   2   5.0
1   brown red   4   5   5.0
3   red   blue  6   1   4.0
6   red   grey  4   6   NaN
1   brown red   4   5   8.0
5   black brown 2   8   NaN
2   black grey  0   0   2.0
5   black brown 2   8   NaN
0   red   green 1   2   0.0
4   green blue  0   3   NaN
2   black grey  0   0   6.0
6   red   grey  4   6   NaN
3   red   blue  6   1   3.0
4   green blue  0   3   NaN

Expected Result:

    a      b    c   d   e1  e2
0   red   green 1   2   5.0 0.0
1   brown red   4   5   8.0 5.0
2   black grey  0   0   2.0 6.0
3   red   blue  6   1   4.0 3.0
4   green blue  0   3   NaN NaN
5   black brown 2   8   NaN NaN
6   red   grey  4   6   NaN NaN

It seems that duplicate the rows when pass pd.concat, How can I fix? I have to change the whole code or it possible by change pd.concat into merge for example?

can you explain what you are trying to do? apologies, but i cant make sense of your code. — sammywemmy
– sammywemmy, Commented May 25, 2021 at 11:19

Nk03 · Accepted Answer · 2021-05-25 11:21:01Z

1

you can perform groupby on dfconcat and then explode the dataframe horizontally.

df1 = dfconcat.groupby(df.columns[:-1].to_list()).agg(list)
df1 = (pd.concat(
    [df1[c].apply(pd.Series).add_prefix(c + "_") 
     for c in df1], 
    axis=1)
 ).reset_index()

output:

       a      b  c  d  e_0  e_1
0  black  brown  2  8  NaN  NaN
1  black   grey  0  0  2.0  6.0
2  brown    red  4  5  5.0  8.0
3  green   blue  0  3  NaN  NaN
4    red   blue  6  1  4.0  3.0
5    red  green  1  2  5.0  0.0
6    red   grey  4  6  NaN  NaN

answered May 25, 2021 at 11:21

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Split DF based on multiple columns concatenate duplicate rows of DF list

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related