16

I have a dataframe:

df = pd.DataFrame({'id' : ['abarth 1.4 a','abarth 1 a','land rover 1.3 r','land rover 2',
                           'land rover 5 g','mazda 4.55 bl'], 
                   'series': ['a','a','r','','g', 'bl'] })

I would like to remove the 'series' string from the corresponding id, so the end result should be:

Final result should be 'id': ['abarth 1.4','abarth 1','land rover 1.3','land rover 2','land rover 5', 'mazda 4.55']

Currently I am using df.apply:

df.id = df.apply(lambda x: x['id'].replace(x['series'], ''), axis =1)

But this removes all instances of the strings, even in other words, like so: 'id': ['brth 1.4','brth 1','land ove 1.3','land rover 2','land rover 5', 'mazda 4.55']

Should I somehow mix and match regex with the variable inside df.apply, like so?

df.id = df.apply(lambda x: x['id'].replace(r'\b' + x['series'], ''), axis =1)

5 Answers 5

33

Use str.split and str.get and assign using loc only where df.make == ''

df.loc[df.make == '', 'make'] = df.id.str.split().str.get(0)

print df

               id    make
0      abarth 1.4  abarth
1        abarth 1  abarth
2  land rover 1.3   rover
3    land rover 2   rover
4    land rover 5   rover
5      mazda 4.55   mazda
Sign up to request clarification or add additional context in comments.

Comments

13

It's simple. Use as follows:

df['make'] = df['id'].str.split(' ').str[0]

Comments

3

IDK why but with the part below

df.loc[df.make == '', 'make']

OR

df.loc[df['make'] == '', 'make']

I get the error - KeyError: 'make'

So instead I did (in case someone sees the same error):

df['make'] = df['id']
df['make'] = df.id.str.split().str.get(0)

Worked for me.

Comments

1

Consider a regex solution with loc where it extracts everything before first space:

df.loc[df['make']=='', 'make'] = df['id'].str.extract('(.*) ', expand=False)

Alternatively, use numpy's where which allows the if/then/else conditional logic:

df['make'] = np.where(df['make']=='', 
                      df['id'].str.extract('(.*) ', expand=False), 
                      df['make'])

Comments

0

If I got your question correctly you can just use replace function:

df.make = df.make.replace("", test.id)

1 Comment

OP requires first word of id column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.