1

Given a pandas DataFrame with multiple columns

pd.DataFrame({'name': ['Bob', 'Alice'], 'age': [20, 40], 'height': [2.0, 2.1]})

    name  age  height
0    Bob   20     2.0
1  Alice   40     2.1

And a function that takes multiple parameters

def example_hash(name: str, age: int) -> str:
    return "In 10 years {} will be {}".format(name, age+10)

How can the DataFrame be updated with an additional column which contains the result of applying a function to a subset of the other columns?

The resulting DataFrame would be the result of applying example_hash to the name & age columns:

    name  age  height                            hash
0    Bob   20     2.0     In 10 years Bob would be 30
1  Alice   40     2.1    In 10 years Alice will be 50

I'm interested in a pandas centric response. I understand that it's possible to construct a python list, iterate over the rows, and append to the list which would eventually become the column.

Thank you in advance for your consideration and response.

1

2 Answers 2

4

You can do this without changing your example_hash() method:

Just use np.vectorize

In [2204]: import numpy as np 

In [2200]: def example_hash(name: str, age: int) -> str: 
      ...:     return "In 10 years {} will be {}".format(name, age+10) 
      ...:                                    
In [2202]: df['new'] = np.vectorize(example_hash)(df['name'], df['age'])                                                                                                                                    

In [2203]: df                                                                                                                                                                                               
Out[2203]: 
    name  age  height                           new
0    Bob   20     2.0    In 10 years Bob will be 30
1  Alice   40     2.1  In 10 years Alice will be 50

OR use df.apply with lambda like this without changing your custom method:

In [2207]: df['new'] = df.apply(lambda x: example_hash(x['name'], x['age']), axis=1)                                                                                                                        

In [2208]: df                                                                                                                                                                                               
Out[2208]: 
    name  age  height                           new
0    Bob   20     2.0    In 10 years Bob will be 30
1  Alice   40     2.1  In 10 years Alice will be 50
Sign up to request clarification or add additional context in comments.

Comments

3

you can use apply function to iterate over the rows and add a new column.

In [139]: df = pd.DataFrame({'name': ['Bob', 'Alice'], 'age': [20, 40], 'height': [2.0, 2.1]})

In [140]: df
Out[140]:
    name  age  height
0    Bob   20     2.0
1  Alice   40     2.1


In [142]: def example_hash(row):
     ...:     row['hash']= "In 10 years {} will be {}".format(row['name'], row['age']+10)
     ...:     return row
     ...:

In [143]: df = df.apply(example_hash,axis=1)

In [144]: df
Out[144]:
    name  age  height                          hash
0    Bob   20     2.0    In 10 years Bob will be 30
1  Alice   40     2.1  In 10 years Alice will be 50

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.