Python Pandas: Apply function using column names as named arguments

Question

Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe.

df = pd.DataFrame({'A':[1,2,3],
               'B':[1,2,3],
               'C':[1,2,3],
               'D':[1,2,3]})    
def f(A,B,C):
   #Pretend code is more complicated
   return A + B + C

Is there a way I can do something like

df.apply(f)

and have pandas match the columns to named arguments?

I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated

EDIT:

Figured it out based @juanpa.arrivillaga answer:

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

Is there a way to do that without having to rewrite the function f? — Jack
– Jack, Commented Oct 18, 2019 at 16:53
the function f is just a toy example. My real function is more complicated. — Jack
– Jack, Commented Oct 18, 2019 at 16:57
It's probably more helpful to be explicit with the example function. — Trenton McKinney
– Trenton McKinney, Commented Oct 18, 2019 at 16:59
Right now, your apply is acting column-wise. In that case it's far more logical to just pass it the Series as you have it. You could make it row-wise but honestly in most cases you can avoid looping with apply in favor of vectorized operations. — ALollz
– ALollz, Commented Oct 18, 2019 at 17:05

Quang Hoang · Accepted Answer · 2019-10-18 17:05:21Z

7

The function to apply f needs to accept either rows/columns, depending on axis=0,1, of df as an argument, not the column name. You can write a wrapper for this purpose.

def wrapper(x, A, B, C):
    return f(x[A], x[B], x[C])

df.apply(wrapper, axis=1, args=('A','B','C'))

Output:

0    3
1    6
2    9
dtype: int64

answered Oct 18, 2019 at 17:05

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexey · Accepted Answer · 2019-10-18 16:58:40Z

4

if you are interesting for "apply" function, here is the case

df = pd.DataFrame({'A':[1,2,3],
                  'B':[1,2,3],
                  'C':[1,2,3],
                  'D':[1,2,3]})     


def func(row):
    row['result'] = row['A'] + row['B'] + row['C']
    return row

df.apply(func, axis = 1)


    Out[67]: 
       A  B  C  D  result
    0  1  1  1  1       3
    1  2  2  2  2       6
    2  3  3  3  3       9

UPD

If you have to use function "f" and don't want to change it, may be this:

df['res'] = f(df['A'], df['B'], df['C'])
df

    Out[70]: 
       A  B  C  D  res
    0  1  1  1  1    3
    1  2  2  2  2    6
    2  3  3  3  3    9

edited Oct 18, 2019 at 16:58

answered Oct 18, 2019 at 16:50

Alexey

1,1268 silver badges8 bronze badges

2 Comments

Jack Over a year ago

That works, but is there anyway to do that without having to rewrite the function f?

Jack Over a year ago

That works and is simple, so upvoted! But generally most functions can't operate on series', and my true function in particular has if statements in it.

Jack · Accepted Answer · 2019-10-18 17:56:42Z

2

Figured it out, building off of @juanpa.arrivillaga answer.

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

answered Oct 18, 2019 at 17:56

Jack

5382 gold badges5 silver badges20 bronze badges

Comments

juanpa.arrivillaga · Accepted Answer · 2019-10-18 17:08:29Z

There is no good way in general. However, if your column names alight exactly you can wrap the function in another function that splats the row argument into your function, because Series objects are mappings!

So given:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3],
...                'B':[1,2,3],
...                'C':[1,2,3],
...                'D':[1,2,3]})
>>> df
   A  B  C  D
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
>>> def f(A, B, C): return A + B + C
...

We could almost do:

>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<stdin>", line 1, in <lambda>
TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')

If you know what the columns you need, you can select/drop to get the correct series:

>>> df.drop('D',axis=1).apply(lambda row: f(**row), axis=1)
0    3
1    6
2    9

Yeah I threw that column D in because there typically are extra columns. I like the trick with the ** though

Collectives™ on Stack Overflow

Python Pandas: Apply function using column names as named arguments

4 Answers 4

Comments

UPD

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

UPD

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related