Python Dataframe:Change values of a column based on another column?

Question

I have a DataFrame with two columns a and b. I want change NaN values in column b. Eg: For the value of 123 in column a, column b has both abc and NaN. I want both to change to abc:

df
         a        b
0     123       NaN
1     123       abc
2     456       def
3     456       NaN

My expected result is:

df
        a         b
0     123       abc
1     123       abc
2     456       def
3     456       def

Sample data:

import pandas as pd
from io import StringIO

s = '''\
a,b
123,NaN
123,abc
456,def
456,NaN
'''
df = pd.read_csv(StringIO(s))

Describing the issue and what i have tried:

df.loc[df.a == 123, 'b'] = "abc"

Here i'm able to change only for a particular value. i.e., replace 'b' with abc if 'a' is 123

But for df.a == 123 and with 'b' value NaN i also wanted it to update abc.

So I tried this,

df.loc[df.a == NaN, 'b'] = "abc"

But, This made all the empty columns in df to abc.

So, How do i proceed from this?

Edit 2: Sample data 2

raw_data = {'a': [123, 123, 456, 456,789], 'b': 
[np.nan,'abc','def',np.nan,np.nan], 'c': 
[np.nan,np.nan,0,np.nan,np.nan]} 

 df = pd.DataFrame(raw_data, columns = 
['a', 'b','c'])

Ans:

 df['b'] = df['a'].map(df.groupby('a')['b'].first()).fillna(df['b'])

What is the issue, exactly? Have you tried anything, done any research? You could at the very least provide the data in a more convenient format. — AMC
– AMC, Commented Apr 15, 2020 at 16:53
Issue: I have described in above.Please read the question and description. What i have tried: I tried replace and iloc couldn't succeed. Research : I did with replace and iloc . Sample data: Updated question with sample data. Any other suggestions @AMC — user11509999
– user11509999, Commented Apr 15, 2020 at 17:30
Issue: I have described in above. No, you described the goal/objective, not a specific problem or obstacle. What i have tried: I tried replace and iloc couldn't succeed. Then why not show that? — AMC
– AMC, Commented Apr 15, 2020 at 17:45
Updated with Issue and what i have tried. Any other suggestions @AMC — user11509999
– user11509999, Commented Apr 15, 2020 at 18:03

JvdV · Accepted Answer · 2020-04-16 08:15:30Z

2

Maybe first sort your dataframe, then use ffill. Something like:

df = df.sort_values(by=['a','b']).fillna(method='ffill')

To do this when you have NaN values you don't want to overwrite (your "edit2"), you can also use groupby:

df['b'] = df.sort_values(by=['a','b','c']).groupby('a')['b'].ffill()

edited Apr 16, 2020 at 8:15

answered Apr 15, 2020 at 16:23

JvdV

76.8k8 gold badges48 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user11509999 Over a year ago

When there are multiple columns ffill filling the data to other non-empty columns. Any way to select certain columns for ffill

user11509999 Over a year ago

if another row is there it changing it value too. Added sample data in question

user11509999 Over a year ago

Found the issue, Updated.

MachineLearner · Accepted Answer · 2020-04-15 16:27:06Z

0

Here is a solution that is using the pandas apply function. It will apply a specific function (here: my_function) to a specific column. You can change the rules of how to map the values inside my_function. This will allow you to solve more difficult problems.

import pandas as pd
import numpy as np

# generate some data
df = pd.DataFrame({'A': [123, 123, 124, 456, 456], 'B': [np.NaN, 'abc', 'def', 1, np.NaN]})

# define function that maps np.NaN to 'abc'
def my_function(value):
    if value == np.NaN:
        return 'abc'
    else:
        return value

# apply function to column 'B'
df['mapped_B'] = df['B'].apply(my_function)

# check output
df.head()

#   A   B   mapped_B
# 0 123 NaN NaN
# 1 123 abc abc
# 2 124 def def
# 3 456 1   1
# 4 456 NaN NaN

edited Apr 15, 2020 at 16:27

answered Apr 15, 2020 at 16:24

MachineLearner

9772 gold badges8 silver badges23 bronze badges

1 Comment

UJIN Over a year ago

Please use np.isnan instead of value == np.NaN as explained here. Also, with the proposed fix, this would only work for 'abc', not for 'def'. Consider heavily edit this answer.

Collectives™ on Stack Overflow

Python Dataframe:Change values of a column based on another column?

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related