Creating a new pandas column by using lambda function on two existing columns

Question

I am able to add a new column in Pandas by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})

Some thing like:

df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )

One approach:

df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))

which gives a column of NaNs.

      a       b   c
0   dfg      sd NaN
1     f     dfg NaN
2   fff     edr NaN
3  fgrf      df NaN
4  fghj  fghjky NaN

Community · Accepted Answer · 2017-05-23 12:09:35Z

54

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Nov 12, 2015 at 20:44

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fed Over a year ago

For future readers, the mistake was thus forgetting the axis = 1 (which caused the KeyError 'a' as we were iterating through the row_indexer [0,1,2,3,4]) instead of df['a'], df['b']. And also Jezraels Solution#2 is a bit neater, since lambda already loops through the rows.

cottontail · Accepted Answer · 2025-10-13 04:28:55Z

Working on strings is a bit of a special case because string operations in pandas are not optimized so, a Python loop may actually perform better than vectorized pandas methods. So a list comprehension is a viable method; it's readable and very fast:

df['c'] = [max(len(a), len(b)) for a, b in zip(df['a'], df['b'])]

For a little shorter code, you can try map() (or applymap() for pandas<2.1.0):

df['c'] = df.map(len).max(axis=1)

If you're applying a lambda using if-condition, make sure to also supply the else.

df['c'] = df.apply(lambda row: len(row['a']) if len(row['a']) > len(row['b']) else len(row['b']), axis=1)

In general, you should avoid using a lambda wherever possible, because pandas has a whole host of optimized operations you can use to operate directly on the columns. For example, if you need to find the maximum value of each row, you can simply call max(axis=1) like: df[['a', 'b']].max(1).

Collectives™ on Stack Overflow

Creating a new pandas column by using lambda function on two existing columns

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related