2

I have a list of assorted values which I need to replace with a single value (Drive-by). I did my research but the closest post I could find is the below attached link which is not using Pandas. What is the most viable way to get this accomplished?

Python replace multiple strings

fourth = pd.read_csv('C:/infocentertracker.csv')
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'})
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by')

fourth.to_csv(.............)

Drive By
Drive-By
Drive-by; Return Visitor
Drive/LTX.com/Internes Srch                  Replace all with Drive-by
Driving By/Lantana Website
Drive by
Driving By/Return Visitor
Drive by/Resident Referral
Driving by
Drive- by
Driving by/LTX Website
Driving By
Driving by/Return Visitor
Drive By/Return Visitor
Drive By/LTX Website
2
  • is it safe to assume that only target values start with "Driv"? Commented Feb 10, 2017 at 14:29
  • Yes, it is safe to assume. Commented Feb 10, 2017 at 14:29

3 Answers 3

2

You can use boolean mask by str.startswith for replace all values starts with Driv and idea is from comment of Marat:

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by'

Sample:

print (fourth)
                            col
0                      Drive By
1                      Drive-By
2      Drive-by; Return Visitor
3   Drive/LTX.com/Internes Srch
4    Driving By/Lantana Website
5                      Drive by
6     Driving By/Return Visitor
7    Drive by/Resident Referral
8                    Driving by
9                     Drive- by
10       Driving by/LTX Website
11                   Driving By
12    Driving by/Return Visitor
13      Drive By/Return Visitor
14         Drive By/LTX Website
15                          aaa
fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by'
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Another solution with Series.mask:

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False),
                                       'Drive-by')
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, sorry if this may sound stupid, I tried fourth.loc[fourth.col.str.startswith('Driv'), 'Source:'] = 'Drive-by' but it threw out an error.....'DataFrame' object has no attribute 'col'
It is column nmae, I change it by your column name to Source:
1

One option is the following as you requested a pandas method:

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by'

I would prefer to use regex which is not necessarily requiring pandas:

import re

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']]

2 Comments

Thanks, I get an error...ValueError: cannot index with vector containing NA / NaN values
@Pythoner I added an extra parameter in the str.contains which is na=False. All native pandas functions. Just wasn't sure what your data looked like
1

You can replace multiple values (a list) with a single value in Pandas

govt_alias = ['govt', 'govern']
df['installer'].str.replace('|'.join(govt_alias), 'government')

In your specific case, the other answers are more optimal, but the method I've shown is generalize-able.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.