process multiple data frames in loop and output separately pandas

Question

I have two separate data frames, and I want to do the same thing to both. I want to pair the columns according to the first substring before the underscore (a, b, x, y), and then if a value in the first column contains a word, but the corresponding row in the totals column is null, i want to update the total to a zero. I want to update the data frames and then output them both separately.

import pandas as pd
import numpy as np

d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
                        'a_total': [5,12,4,np.nan,0],
                        'b': ['blue','orange','pink', np.nan, np.nan],
                        'b_total': [12,6,0,0, np.nan]})

d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
                        'y_total': [182,32,13, np.nan,8],
                        'z': ['car','bike','walk', np.nan, np.nan],
                        'z_total': [12,6,np.nan,np.nan, np.nan]})

then i want to do something to both data frames, and then output the updated versions separately. My current code copied below is not outputting properly. I am trying to output a dicitonary of dataframes, but if I can just output the two data frames (d1 and d2) somehow that would also be good.

out = {}
for i, df in enumerate([d1, d2]):        
    key_id = [*df.loc[:,~df.columns.str.endswith('total')].columns]
    totals = [*df.loc[:,df.columns.str.endswith('total')].columns]
    for col in key_id: 
        pairs = df.loc[:, df.columns.str.startswith(col)]
        pairs[col+'_total'].loc[(pairs[col].notnull()) & (pairs[col+'_total'].isnull())] = 0
    out[i] = pd.concat([pairs], axis=1)

thank you for looking

mitoRibo · Accepted Answer · 2021-11-19 18:22:15Z

Not sure I exactly understand what your need in your output, but maybe this works?

import pandas as pd
import numpy as np

d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
                        'a_total': [5,12,4,np.nan,0],
                        'b': ['blue','orange','pink', np.nan, np.nan],
                        'b_total': [12,6,0,0, np.nan]})

d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
                        'y_total': [182,32,13, np.nan,8],
                        'z': ['car','bike','walk', np.nan, np.nan],
                        'z_total': [12,6,np.nan,np.nan, np.nan]})

#show d1 before making changes
print(d1)

#make the changes directly to d1 and d2
for i, df in enumerate([d1, d2]):        
    cols = [c for c in df.columns if not c.endswith('total')]
    
    for col in cols: 
        tot_col = col+'_total'    
        df.loc[df[col].notnull() & df[tot_col].isnull(), tot_col] = 0
    
    
#show d1 after making changes
print(d1)

d1 before changes:

           a  a_total       b  b_total
0        yes      5.0    blue     12.0
1         no     12.0  orange      6.0
2      maybe      4.0    pink      0.0
3  sometimes      NaN     NaN      0.0
4        NaN      0.0     NaN      NaN

d1 after changes:

           a  a_total       b  b_total
0        yes      5.0    blue     12.0
1         no     12.0  orange      6.0
2      maybe      4.0    pink      0.0
3  sometimes      0.0     NaN      0.0
4        NaN      0.0     NaN      NaN

Collectives™ on Stack Overflow

process multiple data frames in loop and output separately pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related