13

Is there a way in pandas to apply a function to a dataframe using the column names as argument names? For example, I have a function and a dataframe.

df = pd.DataFrame({'A':[1,2,3],
               'B':[1,2,3],
               'C':[1,2,3],
               'D':[1,2,3]})    
def f(A,B,C):
   #Pretend code is more complicated
   return A + B + C

Is there a way I can do something like

df.apply(f)

and have pandas match the columns to named arguments?

I know I can rewrite the function to take a row instead of named arguments, but keep in mind that f is just a toy example and my real function is more complicated

EDIT:

Figured it out based @juanpa.arrivillaga answer:

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

7
  • 1
    ?df.eval('A+B+C') Commented Oct 18, 2019 at 16:47
  • 1
    Is there a way to do that without having to rewrite the function f? Commented Oct 18, 2019 at 16:53
  • the function f is just a toy example. My real function is more complicated. Commented Oct 18, 2019 at 16:57
  • 1
    It's probably more helpful to be explicit with the example function. Commented Oct 18, 2019 at 16:59
  • Right now, your apply is acting column-wise. In that case it's far more logical to just pass it the Series as you have it. You could make it row-wise but honestly in most cases you can avoid looping with apply in favor of vectorized operations. Commented Oct 18, 2019 at 17:05

4 Answers 4

7

The function to apply f needs to accept either rows/columns, depending on axis=0,1, of df as an argument, not the column name. You can write a wrapper for this purpose.

def wrapper(x, A, B, C):
    return f(x[A], x[B], x[C])

df.apply(wrapper, axis=1, args=('A','B','C'))

Output:

0    3
1    6
2    9
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

4

if you are interesting for "apply" function, here is the case

df = pd.DataFrame({'A':[1,2,3],
                  'B':[1,2,3],
                  'C':[1,2,3],
                  'D':[1,2,3]})     


def func(row):
    row['result'] = row['A'] + row['B'] + row['C']
    return row

df.apply(func, axis = 1)


    Out[67]: 
       A  B  C  D  result
    0  1  1  1  1       3
    1  2  2  2  2       6
    2  3  3  3  3       9

UPD

If you have to use function "f" and don't want to change it, may be this:

df['res'] = f(df['A'], df['B'], df['C'])
df

    Out[70]: 
       A  B  C  D  res
    0  1  1  1  1    3
    1  2  2  2  2    6
    2  3  3  3  3    9

2 Comments

That works, but is there anyway to do that without having to rewrite the function f?
That works and is simple, so upvoted! But generally most functions can't operate on series', and my true function in particular has if statements in it.
2

Figured it out, building off of @juanpa.arrivillaga answer.

df[list(f.__code__.co_varnames)].apply((lambda row: f(**row)), axis=1)

Comments

1

There is no good way in general. However, if your column names alight exactly you can wrap the function in another function that splats the row argument into your function, because Series objects are mappings!

So given:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3],
...                'B':[1,2,3],
...                'C':[1,2,3],
...                'D':[1,2,3]})
>>> df
   A  B  C  D
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
>>> def f(A, B, C): return A + B + C
...

We could almost do:

>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<stdin>", line 1, in <lambda>
TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')

If you know what the columns you need, you can select/drop to get the correct series:

>>> df.drop('D',axis=1).apply(lambda row: f(**row), axis=1)
0    3
1    6
2    9

1 Comment

Yeah I threw that column D in because there typically are extra columns. I like the trick with the ** though

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.