1

I have a Pandas dataframe and one of the columns is a string. I imported a function from an external module to do some RegEx checking and reduce this string to a short classification.

This works:

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x))

However what I would really like to do is incorporate another column 'Rev' in the dataframe which happens to be either be a float or NaN into the checking.

When I did this:

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x,df['Rev']))

and I was doing logical checks inside the classification function on the 2nd argument, I got this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What I am looking for is a way to capture the 2nd argument value by value, just as lambda x: captures the first argument value by value.

2 Answers 2

2

The method above is ok I guess if it worked... In my opinion it does not answer the question because you're concatenating two arguments into one.

A way to do this to allow you to pass two arguments to apply:

df['PageCLass'] = df[['PageClass','Rev']].apply(lambda x: PageClassify.page_classify(*x), axis=1)

I don't know what the page_classify method looks like but if it takes two arguments the above should work. Does this work for you?

Sign up to request clarification or add additional context in comments.

3 Comments

I changed the page_classify method to take two arguments (instead of splitting one argument by the \t tab as per above and changed the main program with your line: I got this message: TypeError: ('page_classify() takes exactly 2 arguments (120467 given)', u'occurred at index PageClass')
I added axis=1 and things ran, but strangely my target column df['PageClass'] did not update, i.e. the returned value from the function did not get assigned to it. Inspecting it, it seems to be unchanged.
Good call, right when you were commenting this, I was trying it out and sure enough df['blargh'] receives the returned value! I wonder why it didn't like to assign "in place."
1

Assuming you want to just do this row by row, the following should work:

df['PageCLass'] = (df['PageClass'] + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Here, you are simply concatenating the two dataframe columns together and then you can apply the function to each row in the new column. If you need to check the values of PageClass and Rev as separate arguments, you could also add a delimiter (e.g. '\t') to the concatenation and then simply split on that inside the function:

df['PageCLass'] = (df['PageClass'] + '\t' + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Hope this helps!

1 Comment

I used the \t and split inside the function, it works great. This gets the columns in "lock-step" for logical processing by the function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.