0

I have a situation where I want to search if a substring exists in a large text. So, I was simply using:

if pattern in text: ...

But, I want to ensure that the existence of "pattern" in "text" is not immediately prefixed or suffixed by alphabets. It's alright if it is lead or trailed by special characters, numbers or whitespaces.

So, if pattern is "abc", match on "some text abc", "random texts, abc, cde" should return True, while search on "some textabc", "random abctexts" should return False (because "abc" is lead or trailed by alphabets).

What is the best way to perform this operation?

1
  • r'(?:[^a-zA-Z])(abc)(?:[^a-zA-Z])' will capture only abc. (?: ...) indicates a non-capturing group, so you don't capture the non-alphabets characters. You can check this community guide on regex and feel free to experiment with tools like regex101 Commented Oct 11, 2022 at 17:19

1 Answer 1

1

How about this:

import re

string = "random texts, abc, cde"

match = re.search(r'(^|[^a-zA-Z])abc([^a-zA-Z]|$)', string)
# If-statement after search() tests if it succeeded
if match:
    print('found', match.group())
else:
    print('did not find')

"(^|[^a-zA-Z])" means: beginning of string OR any non-alphabetic character, ([^a-zA-Z]|$) similar for end of string.

To explain a bit more: "|" means an OR, so (^|d) means "beginning of line or a d". The brackets are to define on which arguments the OR operator operates. You wanted your abc-string not to be enclosed by any alphabetic character. If you broaden this a little, so that also 0-9 and the underscore are forbidden, you get a simpler regex: r'(^|\W)abc(\W|$)'

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.