Pandas DataFrame Apply function, multiple arguments

Question

I have a Pandas dataframe and one of the columns is a string. I imported a function from an external module to do some RegEx checking and reduce this string to a short classification.

This works:

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x))

However what I would really like to do is incorporate another column 'Rev' in the dataframe which happens to be either be a float or NaN into the checking.

When I did this:

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x,df['Rev']))

and I was doing logical checks inside the classification function on the 2nd argument, I got this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What I am looking for is a way to capture the 2nd argument value by value, just as lambda x: captures the first argument value by value.

Andrew L · Accepted Answer · 2017-03-28 23:21:23Z

2

The method above is ok I guess if it worked... In my opinion it does not answer the question because you're concatenating two arguments into one.

A way to do this to allow you to pass two arguments to apply:

df['PageCLass'] = df[['PageClass','Rev']].apply(lambda x: PageClassify.page_classify(*x), axis=1)

I don't know what the page_classify method looks like but if it takes two arguments the above should work. Does this work for you?

edited Mar 28, 2017 at 23:21

answered Mar 28, 2017 at 20:30

Andrew L

7,1083 gold badges28 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mark Ginsburg Over a year ago

I changed the page_classify method to take two arguments (instead of splitting one argument by the \t tab as per above and changed the main program with your line: I got this message: TypeError: ('page_classify() takes exactly 2 arguments (120467 given)', u'occurred at index PageClass')

Mark Ginsburg Over a year ago

I added axis=1 and things ran, but strangely my target column df['PageClass'] did not update, i.e. the returned value from the function did not get assigned to it. Inspecting it, it seems to be unchanged.

Mark Ginsburg Over a year ago

Good call, right when you were commenting this, I was trying it out and sure enough df['blargh'] receives the returned value! I wonder why it didn't like to assign "in place."

gaw89 · Accepted Answer · 2017-03-28 19:05:27Z

1

Assuming you want to just do this row by row, the following should work:

df['PageCLass'] = (df['PageClass'] + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Here, you are simply concatenating the two dataframe columns together and then you can apply the function to each row in the new column. If you need to check the values of PageClass and Rev as separate arguments, you could also add a delimiter (e.g. '\t') to the concatenation and then simply split on that inside the function:

df['PageCLass'] = (df['PageClass'] + '\t' + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Hope this helps!

answered Mar 28, 2017 at 19:05

gaw89

1,0689 silver badges22 bronze badges

1 Comment

Mark Ginsburg Over a year ago

I used the \t and split inside the function, it works great. This gets the columns in "lock-step" for logical processing by the function.

Collectives™ on Stack Overflow

Pandas DataFrame Apply function, multiple arguments

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related