Substring search with a (regex?) condition Python

Question

I have a situation where I want to search if a substring exists in a large text. So, I was simply using:

if pattern in text: ...

But, I want to ensure that the existence of "pattern" in "text" is not immediately prefixed or suffixed by alphabets. It's alright if it is lead or trailed by special characters, numbers or whitespaces.

So, if pattern is "abc", match on "some text abc", "random texts, abc, cde" should return True, while search on "some textabc", "random abctexts" should return False (because "abc" is lead or trailed by alphabets).

What is the best way to perform this operation?

r'(?:[^a-zA-Z])(abc)(?:[^a-zA-Z])' will capture only abc. (?: ...) indicates a non-capturing group, so you don't capture the non-alphabets characters. You can check this community guide on regex and feel free to experiment with tools like regex101 — Ignatius Reilly
– Ignatius Reilly, Commented Oct 11, 2022 at 17:19

score 1 · Accepted Answer · 2022-10-13 10:28:58Z

1

How about this:

import re

string = "random texts, abc, cde"

match = re.search(r'(^|[^a-zA-Z])abc([^a-zA-Z]|$)', string)
# If-statement after search() tests if it succeeded
if match:
    print('found', match.group())
else:
    print('did not find')

"(^|[^a-zA-Z])" means: beginning of string OR any non-alphabetic character, ([^a-zA-Z]|$) similar for end of string.

To explain a bit more: "|" means an OR, so (^|d) means "beginning of line or a d". The brackets are to define on which arguments the OR operator operates. You wanted your abc-string not to be enclosed by any alphabetic character. If you broaden this a little, so that also 0-9 and the underscore are forbidden, you get a simpler regex: r'(^|\W)abc(\W|$)'

Collectives™ on Stack Overflow

Substring search with a (regex?) condition Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related