1

My list contains some words like : [‘orange’, ‘cool’, ‘app’....] and I want to output all these exact whole words (if available) from a description column in a DataFrame.

I have also attached a sample picture with code. I used str.findall() The picture shows, it extracts add from additional, app from apple. However, I do not want that. It should only output if it matches the whole word. enter image description here

1 Answer 1

1

You can fix the code using

df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")

Or, if there can be special chars in your list1 words,

df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")

The pattern created by fr"\b({'|'.join(list1)})\b" and fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)" will look like

\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)

See the regex demo. Note .str.join(", ") is considered faster than .apply(", ".join).

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you! However, if my text also has word with hyphen, for e.g. additional-material, or plural e.g. apples, how can I modify my search instead of having “additional-material and apples” in my list1 but still get the output additional material and apple. Thanks!
@ShrestR Try r"(?<!\w)(" + '|'.join([re.escape(x).replace('\\ ', r'[\s-]') for x in list1]) + r")"
Hi, how to do the same exact match operation in pyspark df? below is for pandas: df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")
@ShrestR I do not know pyspark well, I think you should use a pyspark.sql.functions.regexp_replace like regexp_replace(col, fr"(?s)(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)|.?", r"\1, ") and this value should be also replaced with regexp_replace(<the-result-of the previous substitution>, '^(?:, )+|(?:, )+$|(, )+', r'\1')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.