Replace some values with a string in a column in python

Thanks aws_apprentice! I tried that way. But apparently, I made some mistakes

Georgina Skibinski · Accepted Answer · 2020-02-11 07:35:55Z

1

Try (and group by Country):

import numpy as np

df["Country"]=np.where(df["Country"].eq("Mainland China"), "Mainland China", "Other")

Edit

timeit (please note I didn't do .loc[] as lambda doesn't support assignment - feel free to suggest a way of adding it):

import pandas as pd
import numpy as np
import timeit
from timeit import Timer

#proportion-wise that's the dataframe, as per OP's question

df=pd.DataFrame({"Country": ["Mainland China"]*398+["a", "b","c"]*124})

df["otherCol"]=2
df["otherCol2"]=3

#shuffle

df2=df.copy().sample(frac=1)
df3=df2.copy()
df4=df3.copy()

op2=Timer(lambda: np.where(df2["Country"].eq("Mainland China"), "Mainland China", "Other"))
op3=Timer(lambda: df3.Country.map(lambda x: x if x == 'Mainland China' else 'Others'))
op4=Timer(lambda: df4["Country"].apply(lambda x: x if x == "Mainland China" else "Others"))

print(op2.timeit(number=1000))
print(op3.timeit(number=1000))
print(op4.timeit(number=1000))

Returns:

2.1856687490362674 #numpy
2.2388894270407036 #map
2.4437739049317315 #apply

edited Feb 11, 2020 at 7:35

answered Feb 9, 2020 at 21:14

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

7 Comments

Georgina Skibinski Over a year ago

Thanks Grazegorz, even though your solution comes later than those two guys, I know one way extra solving this problem. Thank you :D

No worries- time them - you will have some criteria to compare ;) I would expect np.where to be a bit faster than .loc[...] .apply(...) is outside of competition here.

What is the advantage of using this over .loc[], aside from a tiny performance gain?

Georgina Skibinski Over a year ago

Looking at stackoverflow.com/a/31173785/5082048, performance might be lower than for .map(lambda x: ...) for small datasets. List comprehensions scored best in that benchmark.

I benchmarked all the methods except .loc[] - please see above.

|

Oliver Ni · Accepted Answer · 2020-02-09 21:06:32Z

-1

Try using apply:

dataframe["Country"] = dataframe["Country"].apply(lambda x: x if x == "Mainland China" else "Others")

answered Feb 9, 2020 at 21:06

Oliver Ni

2,6627 gold badges33 silver badges44 bronze badges

4 Comments

thanks, your solution works perfectly as well. Since I have already accepted one solution, I sincerely appreciate your help !

@AMC Thanks for involving in the discussion! I suppose that we should respect everybody's efforts. What do you think :)

@almo I agree entirely, my statement was in no way related to the answerer’s person or character.

@AMC on the plus side, it is quite flexible if other categories need to be defined in the future.

Arco Bast · Accepted Answer · 2020-02-09 21:06:32Z

-2

Assuming df is your pandas dataframe.

You could do:

df['Country'] = df.Country.map(lambda x: x if x == 'Mainland China' else 'Others')

answered Feb 9, 2020 at 21:06

Arco Bast

3,9502 gold badges31 silver badges56 bronze badges

6 Comments

Thanks, perfect:D. I have wasted nearly one hour.

@AMC on the plus side, it is quite flexible if other categories need to be defined in the future.

@ArcoBast You could just use .map() and a dictionary, which is likely the most flexible solution.

@AMC I thought about this, but mapping everything except 'Mainland China' to a single value is not straightforward with a dictionary. I could have suggested using a defaultdict, of course, but considered that to be overkill. From an analyst's point of view, when working with a small dataset like this one, flexibility beats speed in my experience.