0

I have two pandas dataframe

import pandas as pd 
import numpy as np
import datetime

data = {'group'      :["A","A","B","B"],
        'val': ["AA","AB","B1","B2"],
        'cal1'     :[4,5,7,6],
        'cal2'     :[10,100,100,10]
       } 

df1 = pd.DataFrame(data) 
df1

    group   val    cal1   cal2
0   A       AA     4      10
1   A       AB     5      100
2   B       B1     7      100
3   B       B2     6      10

data = {'group'      :["A","A","A","B","B","B","B", "B", "B", "B"],
        'flag' : [1,0,0,1,0,0,0, 1, 0, 0],
        'var1': [1,2,3,7,8,9,10, 15, 20, 30]
       } 

# Create DataFrame 
df2 = pd.DataFrame(data) 
df2

    group   flag    var1
0   A       1       1
1   A       0       2
2   A       0       3
3   B       1       7
4   B       0       8
5   B       0       9
6   B       0       10
7   B       1       15
8   B       0       20
9   B       0       30

Step 1: CReate columns in df2(with suffix "_new") based on unique "val" in df1 like below:

unique_val = df1['val'].unique().tolist()
new_cols = [t + '_new' for t in unique_val]
for i in new_cols:
    df2[i] = 0
df2
    group   flag    var1    AA_new  AB_new  B1_new  B2_new
0   A       1        1       0      0       0        0
1   A       0        2       0      0       0        0
2   A       0        3       0      0       0        0
3   B       1        7       0      0       0        0
4   B       0        8       0      0       0        0
5   B       0        9       0      0       0        0
6   B       0        10      0      0       0        0
7   B       1        15      0      0       0        0
8   B       0        20      0      0       0        0
9   B       0        30      0      0       0        0

Step 2: for row where flag = 1, AA_new will be calculated as var1(from df2)* value of 'cal1' from df1 for group "A" and val "AA" * value of 'cal2' from df1 for group "A" and val "AA", similarly AB_new will be calculated as var1(from df2) * value of 'cal1' from df1 for group "A" and val "AB" * value of 'cal2' from df1 for group "A" and val "AB"

My expected output should look like below:

    group   flag    var1    AA_new  AB_new  B1_new   B2_new
0   A       1       1       40.0    500.0   0.0      0.0
1   A       0       2       0.0     0.0     0.0      0.0
2   A       0       3       0.0     0.0     0.0      0.0
3   B       1       7       0.0     0.0     4900.0   420.0
4   B       0       8       0.0     0.0     0.0      0.0
5   B       0       9       0.0     0.0     0.0      0.0
6   B       0       10      0.0     0.0     0.0      0.0
7   B       1       15      0.0     0.0     10500.0  900.0
8   B       0       20      0.0     0.0     0.0      0.0
9   B       0       30      0.0     0.0     0.0      0.0

Below solution based on the other stackflow question works partially:

df2.assign(**df1.assign(mul_cal = df1['cal1'].mul(df1['cal2']))
                .pivot_table(columns='val',
                             values='mul_cal',
                             index = ['group', df2.index])
                .add_suffix('_new')
                .groupby(level=0)
               .apply(lambda x: x.bfill().ffill()) 
                .reset_index(level='group',drop='group')
                .fillna(0)
                .mul(df2['var1'], axis=0)
                .where(df2['flag'].eq(1), 0)
)

1 Answer 1

1

Flexible Columns

If you want this works when we add several rows more in df1, you can do this.

combinations = df1.groupby(['group','val'])['cal3'].sum().reset_index()

for index_, row_ in combinations.iterrows():
    for index, row in df2.iterrows():
        if row['flag'] == 1:
            if row['group'] == row_['group']:
                df2.loc[index, row_['val'] + '_new'] = row['var1'] * df1[(df1['group'] == row_['group']) & (df1['val'] == row_['val'])]['cal3'].values[0]

Hard Code

You can use iteration to dataframe and change its specific column in each iteration, you can do something like this (but you need to add new column into your df1 first).

df1['cal3'] = df1['cal1'] * df1['cal2']

for index, row in df2.iterrows():
    if row['flag'] == 1:
        if row['group'] == 'A':
            df2.loc[index, 'AA_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AA')]['cal3'].values[0]
            df2.loc[index, 'AB_new'] = row['var1'] * df1[(df1['group'] == 'A') & (df1['val'] == 'AB')]['cal3'].values[0]

        elif row['group'] == 'B':
            df2.loc[index, 'B1_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B1')]['cal3'].values[0]
            df2.loc[index, 'B2_new'] = row['var1'] * df1[(df1['group'] == 'B') & (df1['val'] == 'B2')]['cal3'].values[0]

This is the result I got.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

This solution is hard-coded based on 'group' and 'val', for example if row['group'] == 'A'. What if there are more group values other than "A" and "B" and more categories in "val" column. Is there any way of mapping the two dataframes without hardcoding group and val?
I added some codes if you want it to be flexible, let me know if that what you want

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.