43

I am able to add a new column in Pandas by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})

Some thing like:

df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )

One approach:

df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))

which gives a column of NaNs.

      a       b   c
0   dfg      sd NaN
1     f     dfg NaN
2   fff     edr NaN
3  fgrf      df NaN
4  fghj  fghjky NaN
0

2 Answers 2

54

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

For future readers, the mistake was thus forgetting the axis = 1 (which caused the KeyError 'a' as we were iterating through the row_indexer [0,1,2,3,4]) instead of df['a'], df['b']. And also Jezraels Solution#2 is a bit neater, since lambda already loops through the rows.
0

Working on strings is a bit of a special case because string operations in pandas are not optimized so, a Python loop may actually perform better than vectorized pandas methods. So a list comprehension is a viable method; it's readable and very fast:

df['c'] = [max(len(a), len(b)) for a, b in zip(df['a'], df['b'])]

For a little shorter code, you can try map() (or applymap() for pandas<2.1.0):

df['c'] = df.map(len).max(axis=1)

If you're applying a lambda using if-condition, make sure to also supply the else.

df['c'] = df.apply(lambda row: len(row['a']) if len(row['a']) > len(row['b']) else len(row['b']), axis=1)

In general, you should avoid using a lambda wherever possible, because pandas has a whole host of optimized operations you can use to operate directly on the columns. For example, if you need to find the maximum value of each row, you can simply call max(axis=1) like: df[['a', 'b']].max(1).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.