I have two separate data frames, and I want to do the same thing to both. I want to pair the columns according to the first substring before the underscore (a, b, x, y), and then if a value in the first column contains a word, but the corresponding row in the totals column is null, i want to update the total to a zero. I want to update the data frames and then output them both separately.
import pandas as pd
import numpy as np
d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
'a_total': [5,12,4,np.nan,0],
'b': ['blue','orange','pink', np.nan, np.nan],
'b_total': [12,6,0,0, np.nan]})
d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
'y_total': [182,32,13, np.nan,8],
'z': ['car','bike','walk', np.nan, np.nan],
'z_total': [12,6,np.nan,np.nan, np.nan]})
then i want to do something to both data frames, and then output the updated versions separately. My current code copied below is not outputting properly. I am trying to output a dicitonary of dataframes, but if I can just output the two data frames (d1 and d2) somehow that would also be good.
out = {}
for i, df in enumerate([d1, d2]):
key_id = [*df.loc[:,~df.columns.str.endswith('total')].columns]
totals = [*df.loc[:,df.columns.str.endswith('total')].columns]
for col in key_id:
pairs = df.loc[:, df.columns.str.startswith(col)]
pairs[col+'_total'].loc[(pairs[col].notnull()) & (pairs[col+'_total'].isnull())] = 0
out[i] = pd.concat([pairs], axis=1)
thank you for looking