3

I have a DataFrame with two columns a and b. I want change NaN values in column b. Eg: For the value of 123 in column a, column b has both abc and NaN. I want both to change to abc:

df
         a        b
0     123       NaN
1     123       abc
2     456       def
3     456       NaN

My expected result is:

df
        a         b
0     123       abc
1     123       abc
2     456       def
3     456       def

Sample data:

import pandas as pd
from io import StringIO

s = '''\
a,b
123,NaN
123,abc
456,def
456,NaN
'''
df = pd.read_csv(StringIO(s))

Describing the issue and what i have tried:

df.loc[df.a == 123, 'b'] = "abc"

Here i'm able to change only for a particular value. i.e., replace 'b' with abc if 'a' is 123

But for df.a == 123 and with 'b' value NaN i also wanted it to update abc.

So I tried this,

df.loc[df.a == NaN, 'b'] = "abc"

But, This made all the empty columns in df to abc.

So, How do i proceed from this?

Edit 2: Sample data 2

raw_data = {'a': [123, 123, 456, 456,789], 'b': 
[np.nan,'abc','def',np.nan,np.nan], 'c': 
[np.nan,np.nan,0,np.nan,np.nan]} 

 df = pd.DataFrame(raw_data, columns = 
['a', 'b','c']) 

Ans:

 df['b'] = df['a'].map(df.groupby('a')['b'].first()).fillna(df['b'])
4
  • What is the issue, exactly? Have you tried anything, done any research? You could at the very least provide the data in a more convenient format. Commented Apr 15, 2020 at 16:53
  • Issue: I have described in above.Please read the question and description. What i have tried: I tried replace and iloc couldn't succeed. Research : I did with replace and iloc . Sample data: Updated question with sample data. Any other suggestions @AMC Commented Apr 15, 2020 at 17:30
  • Issue: I have described in above. No, you described the goal/objective, not a specific problem or obstacle. What i have tried: I tried replace and iloc couldn't succeed. Then why not show that? Commented Apr 15, 2020 at 17:45
  • Updated with Issue and what i have tried. Any other suggestions @AMC Commented Apr 15, 2020 at 18:03

2 Answers 2

2

Maybe first sort your dataframe, then use ffill. Something like:

df = df.sort_values(by=['a','b']).fillna(method='ffill')

To do this when you have NaN values you don't want to overwrite (your "edit2"), you can also use groupby:

df['b'] = df.sort_values(by=['a','b','c']).groupby('a')['b'].ffill()
Sign up to request clarification or add additional context in comments.

3 Comments

When there are multiple columns ffill filling the data to other non-empty columns. Any way to select certain columns for ffill
if another row is there it changing it value too. Added sample data in question
Found the issue, Updated.
0

Here is a solution that is using the pandas apply function. It will apply a specific function (here: my_function) to a specific column. You can change the rules of how to map the values inside my_function. This will allow you to solve more difficult problems.

import pandas as pd
import numpy as np

# generate some data
df = pd.DataFrame({'A': [123, 123, 124, 456, 456], 'B': [np.NaN, 'abc', 'def', 1, np.NaN]})

# define function that maps np.NaN to 'abc'
def my_function(value):
    if value == np.NaN:
        return 'abc'
    else:
        return value

# apply function to column 'B'
df['mapped_B'] = df['B'].apply(my_function)

# check output
df.head()

#   A   B   mapped_B
# 0 123 NaN NaN
# 1 123 abc abc
# 2 124 def def
# 3 456 1   1
# 4 456 NaN NaN

1 Comment

Please use np.isnan instead of value == np.NaN as explained here. Also, with the proposed fix, this would only work for 'abc', not for 'def'. Consider heavily edit this answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.