Pandas: apply a function to multiple columns of different data-frames

Question

I have a class, which returns a value by comparing different values. The class is:

class feasible:
    def __init__(self,old_difference, for_value, back_value, fall_back_value):
        self.diff=abs(for_value-back_value)
        for_diff=abs(for_value-fall_back_value)
        back_diff=abs(back_value-fall_back_value)
        if self.diff < old_difference:
            self.value=(for_value+back_value)/2
        elif for_diff<back_diff:
            self.value=(for_value)
        else:
            self.value=(back_value)

How can I apply this class and return the value if the inputs are columns from different data-frames?

All the input frames are in the following format:

   x         y          theta
0  0.550236 -4.621542   35.071022
1  5.429449 -0.374795   74.884065
2  4.590866 -4.628868  110.697109

I tried the following, but returns error (Error: The truth value of a Series is ambiguous) because of the comparison involved.

feasible_x=feasible(diff_frame.x,for_frame.x,back_frame.x,filler_frame.x)
filler_frame.x=feasible_x.value

Some spacing in the code would make it easier to read @Ashok =) — Todd
– Todd, Commented Mar 15, 2020 at 19:54

Parfait · Accepted Answer · 2020-03-15 17:57:56Z

Currently, your method expects to receive scalar values but you pass Pandas Series (i.e., columns of data frames) into the method. Hence, the if logic needs to check every element of the Series (a structure of many same-type values) and not one value. Consequently, you receive the error of ambiguous truth value. Newcomers of Pandas often face this error coming from general purpose Python. Pandas/Numpy maintain a different object model than general Python.

To resolve, because you are essentially calculating new fields with conditional logic, consider binding all Series parameters into one data frame. Then, replace the general Python construct of if...elif...else for numpy.where that runs logic across higher dimensional objects such as arrays.

class feasible:
    def __init__(self, old_difference, for_value, back_value, fall_back_value):
        # HORIZONTAL MERGE (OUTER JOIN) ON INDEX
        x_frame = (pd.concat([old_difference, for_value, back_value, fall_back_value], axis = 1)
                     .set_axis(['old_difference', 'for_value', 'back_value', 'fall_back_value'],
                               axis = 'columns', inplace = False)
                  )

        # ASSIGN NEW CALCULATED COLUMNS
        x_frame['diff'] = (x_frame['for_value'] - x_frame['back_value']).abs()
        x_frame['for_diff'] = (x_frame['for_value'] - x_frame['fall_back_value']).abs()
        x_frame['back_diff'] = (x_frame['back_value'] - x_frame['fall_back_value']).abs()

        # ASSIGN FINAL SERIES BY NESTED CONDITIONAL LOGIC
        self.value = np.where(x_frame['diff'] < x_frame['old_difference'],
                              (x_frame['for_value'] + x_frame['back_value'])/2,
                              np.where(x_frame['for_diff'] < x_frame['back_diff'],
                                       x_frame['for_value'],
                                       x_frame['back_value']
                                      )
                              )

Now depending on the row size of all four data frames, different implementation of result must be handled. Specifically, pd.concat at axis = 1 by default runs on join='outer' so all rows are retained in the horizontal merge operation with NaN filled in for unmatched rows.

If filler_frame (the data frame you intend to add a column) is equal to all rows then a simple assignment is doable.

# IF filler_frame CONTAINS THE MOST ROWS (OR EQUIVALENT TO MOST) OF ALL FOUR DFs
feasible_x = feasible(diff_frame.x,for_frame.x,back_frame.x,filler_frame.x)
filler_frame['x_new'] = feasible_x.value

If not a left join for new column, x_new is required. Below will work across all cases including above.

# IF filler_frame DOES NOT CONTAIN MOST ROWS OF ALL FOUR DFs
feasible_x = feasible(diff_frame.x,for_frame.x,back_frame.x,filler_frame.x)
filler_frame = filler_frame.join(pd.Series(feasible_x.value).rename('x_new'), how = 'left')

Collectives™ on Stack Overflow

Pandas: apply a function to multiple columns of different data-frames

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related