12

I have a data frame df with a column called "Num_of_employees", which has values like 50-100, 200-500 etc. I see a problem with few values in my data. Wherever the employee number should be 1-10, the data has it as 10-Jan. Also, wherever the value should be 11-50, the data has it as Nov-50. How would I rectify this problem using pandas?

1 Answer 1

23

A clean syntax for this kind of "find and replace" uses a dict, as

df.Num_of_employees = df.Num_of_employees.replace({"10-Jan": "1-10",
                                                   "Nov-50": "11-50"})
Sign up to request clarification or add additional context in comments.

8 Comments

If you have a large data set, it might be impossible to use replace like this manually.
@JoeR Right! Is there a way which I can implement on large data?
I ran this over 100,000,000 rows and finished in a couple of seconds. IMO, this is your solution.
@user6461192 yes. There cannot be very many types. you can find them all with df.Num_of_employees.unique() or df.Num_of_employees.value_counts() create a dictionary with all offending entries and the corresponding corrections.
you might not be assigning the result back to the column. df.Num_of_employees.replace({'10-Jan': '1-10', 'Nov-50': '11-50'}) will display the results but you have to capture them with df.Num_of_employees = df.Num_of_employees.replace({'10-Jan': '1-10', 'Nov-50': '11-50'}). You can check before you write your file with print(df.to_csv())
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.