0

I have this kind of Dataframe, I split it by unique value in "a" and "b". DF:

    a     b     c   d
0   red   green 1   2
1   brown red   4   5
2   black grey  0   0
3   red   blue  6   1
4   green blue  0   3
5   black brown 2   8
6   red   grey  4   6

Code:

colors = pd.unique(df[['a', 'b']].values.ravel('K'))

Then I do calculation in a new column "e" for each DF:

df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy()
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
        ),
        axis=1,
    )
    df_list.append(current_df)

df_list


df_list
[       a      b  c  d    e
 0    red  green  1  2  5.0
 1  brown    red  4  5  5.0
 3    red   blue  6  1  4.0
 6    red   grey  4  6  NaN,
        a      b  c  d    e
 1  brown    red  4  5  8.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 2  black   grey  0  0  2.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 0    red  green  1  2  0.0
 4  green   blue  0  3  NaN,
        a     b  c  d    e
 2  black  grey  0  0  6.0
 6    red  grey  4  6  NaN,
        a     b  c  d    e
 3    red  blue  6  1  3.0
 4  green  blue  0  3  NaN]

This code works well but when I am going to combine with dfconcat = pd.concat(df_list) all list of dataframes the result is not what I expect

dfconcat:

    a     b     c   d   e
0   red   green 1   2   5.0
1   brown red   4   5   5.0
3   red   blue  6   1   4.0
6   red   grey  4   6   NaN
1   brown red   4   5   8.0
5   black brown 2   8   NaN
2   black grey  0   0   2.0
5   black brown 2   8   NaN
0   red   green 1   2   0.0
4   green blue  0   3   NaN
2   black grey  0   0   6.0
6   red   grey  4   6   NaN
3   red   blue  6   1   3.0
4   green blue  0   3   NaN

Expected Result:

    a      b    c   d   e1  e2
0   red   green 1   2   5.0 0.0
1   brown red   4   5   8.0 5.0
2   black grey  0   0   2.0 6.0
3   red   blue  6   1   4.0 3.0
4   green blue  0   3   NaN NaN
5   black brown 2   8   NaN NaN
6   red   grey  4   6   NaN NaN

It seems that duplicate the rows when pass pd.concat, How can I fix? I have to change the whole code or it possible by change pd.concat into merge for example?

1
  • 1
    can you explain what you are trying to do? apologies, but i cant make sense of your code. Commented May 25, 2021 at 11:19

1 Answer 1

1

you can perform groupby on dfconcat and then explode the dataframe horizontally.

df1 = dfconcat.groupby(df.columns[:-1].to_list()).agg(list)
df1 = (pd.concat(
    [df1[c].apply(pd.Series).add_prefix(c + "_") 
     for c in df1], 
    axis=1)
 ).reset_index()

output:

       a      b  c  d  e_0  e_1
0  black  brown  2  8  NaN  NaN
1  black   grey  0  0  2.0  6.0
2  brown    red  4  5  5.0  8.0
3  green   blue  0  3  NaN  NaN
4    red   blue  6  1  4.0  3.0
5    red  green  1  2  5.0  0.0
6    red   grey  4  6  NaN  NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.