0

I have two separate data frames, and I want to do the same thing to both. I want to pair the columns according to the first substring before the underscore (a, b, x, y), and then if a value in the first column contains a word, but the corresponding row in the totals column is null, i want to update the total to a zero. I want to update the data frames and then output them both separately.

import pandas as pd
import numpy as np

d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
                        'a_total': [5,12,4,np.nan,0],
                        'b': ['blue','orange','pink', np.nan, np.nan],
                        'b_total': [12,6,0,0, np.nan]})

d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
                        'y_total': [182,32,13, np.nan,8],
                        'z': ['car','bike','walk', np.nan, np.nan],
                        'z_total': [12,6,np.nan,np.nan, np.nan]})

then i want to do something to both data frames, and then output the updated versions separately. My current code copied below is not outputting properly. I am trying to output a dicitonary of dataframes, but if I can just output the two data frames (d1 and d2) somehow that would also be good.

out = {}
for i, df in enumerate([d1, d2]):        
    key_id = [*df.loc[:,~df.columns.str.endswith('total')].columns]
    totals = [*df.loc[:,df.columns.str.endswith('total')].columns]
    for col in key_id: 
        pairs = df.loc[:, df.columns.str.startswith(col)]
        pairs[col+'_total'].loc[(pairs[col].notnull()) & (pairs[col+'_total'].isnull())] = 0
    out[i] = pd.concat([pairs], axis=1)

thank you for looking

1 Answer 1

1

Not sure I exactly understand what your need in your output, but maybe this works?

import pandas as pd
import numpy as np

d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
                        'a_total': [5,12,4,np.nan,0],
                        'b': ['blue','orange','pink', np.nan, np.nan],
                        'b_total': [12,6,0,0, np.nan]})

d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
                        'y_total': [182,32,13, np.nan,8],
                        'z': ['car','bike','walk', np.nan, np.nan],
                        'z_total': [12,6,np.nan,np.nan, np.nan]})

#show d1 before making changes
print(d1)

#make the changes directly to d1 and d2
for i, df in enumerate([d1, d2]):        
    cols = [c for c in df.columns if not c.endswith('total')]
    
    for col in cols: 
        tot_col = col+'_total'    
        df.loc[df[col].notnull() & df[tot_col].isnull(), tot_col] = 0
    
    
#show d1 after making changes
print(d1)

d1 before changes:

           a  a_total       b  b_total
0        yes      5.0    blue     12.0
1         no     12.0  orange      6.0
2      maybe      4.0    pink      0.0
3  sometimes      NaN     NaN      0.0
4        NaN      0.0     NaN      NaN

d1 after changes:

           a  a_total       b  b_total
0        yes      5.0    blue     12.0
1         no     12.0  orange      6.0
2      maybe      4.0    pink      0.0
3  sometimes      0.0     NaN      0.0
4        NaN      0.0     NaN      NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.