1

I couldn't find it in stackoverflow, so I wanted to ask the question.

Let's assume that i have two columns: A, B in data frame, which consist of just a bunch of words, and i want to create a new column C which is just TRUE/FALSE based on the following rule:

 If word in B = word in A + 'ing', then it's True or vice versa
 If word in B = word in A + 'ment', then it's True of vice versa. 

so I defined the following function:

def parts_of_speech(s1, s2):
    return s1+'ing'==s2 or s1+'ment'==s2 or s1+s1[-1]+'ing'==s2

For instance

  A              B            C
Engage         Engagement   True
Go             Going        True
Axe            Axis         False
Management     Manage       True

I tried the following:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or 
                           parts_of_speech(x.B, x.A) )

or

df['C']=df.apply(parts_of_speech(df['A'], df['B']) or 
                           parts_of_speech(df['A'], df['B']) )

I get the same error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't know what i did incorrectly. Is there an easy fix for this?

any help would be greatly appreciated.

1
  • s1+s1[-1]+'ing'==s2 is not correct. You will get something like Manageeing. Use s1[:-1] + 'ing' instead. Commented Sep 19, 2019 at 17:49

2 Answers 2

2

.apply works with columns by default. The only change needed in your example is to add axis=1 to apply to rows:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or parts_of_speech(x.B, x.A),
                 axis=1)
Sign up to request clarification or add additional context in comments.

Comments

1

For your sample data:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

# split by suffixes
df['B'].str.extract('(\w+)(ment|ing)$',expand=True)[0].eq(df['A'])

Or use your approach, but vectorized:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

df['A-ing'] = df['A'] + 'ing'
df['A-ment'] = df['A'] + 'ment'

df.iloc[:,-2].eq(df['A']).all(1)

Output:

0     True
1     True
2    False
3     True
dtype: bool

3 Comments

There are some more complex cases, like hop + ing = hopping. I guess the question is not about the language processing part
Sure, but then it's about the logic, and your answer doesn't work as well.
> I guess the question is not about the language processing part

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.