201

Imagine you are trying to pattern match "stackoverflow".

You want the following:

 this is stackoverflow and it rocks [MATCH]

 stackoverflow is the best [MATCH]

 i love stackoverflow [MATCH]

 typostackoverflow rules [NO MATCH]

 i love stackoverflowtypo [NO MATCH]

I know how to parse out stackoverflow if it has spaces on both sites using:

/\s(stackoverflow)\s/

Same with if its at the start or end of a string:

/^(stackoverflow)\s/

/\s(stackoverflow)$/

But how do you specify "space or end of string" and "space or start of string" using a regular expression?

4 Answers 4

263

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s)  #the | means or. () is a capturing group. 


/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow)   #the string you want. () optional
(?=\s|$)          #to look ahead.
Sign up to request clarification or add additional context in comments.

10 Comments

\b is a zero-width assertion; it never consumes any characters. There's no need to wrap it in a lookaround.
Note that in most regexp implementations, \b is standard ASCII only, that is to say, no unicode support. If you need to match unicode words you have no choice but to use this instead: stackoverflow.com/a/6713327/1329367
The easier way to exclude the group selection from the match is (?:^|\s)
for python, replace (?<=\s|^) with (?:(?<=\s)|(?<=^)). Otherwise, you get error: look-behind requires fixed-width pattern
The \b would consider other characters -- such as "." as word-breakers, whereas the asker specifically said "space". @gordy's solution seems better.
|
103

(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:

(^|\s)stackoverflow($|\s)

3 Comments

If you use this pattern to replace, remember to keep the spaces in the replaced result by replacing with the pattern $1string$2.
This is the only one that works for me too. Word boundaries never seem to do what I want. For one, they match some characters besides whitespace (like dashes). This solved it for me because I'd been trying to put $ and ^ into a character class, but this shows they can just be put into a regular pattern group.
This works quite nicely but if you are not interested in capturing the spaces use this: (?:^|\s)stackoverflow(?:$|\s)
29

Here's what I would use:

 (?<!\S)stackoverflow(?!\S)

In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.

This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.

2 Comments

good explanation on why to use this. i would have picked this however the string being tested is ALWAYS a single line.
@LawrenceDol, did you mean (?<=\S)...(?=\S)? Note that the uppercase \S matches any character that's NOT whitespace. So the negative lookarounds will match if there IS a whitespace character there, or if there's no character at all.
12

\b matches at word boundaries (without actually matching any characters), so the following should do what you want:

\bstackoverflow\b

1 Comment

For Python it helps to specify it a raw string, e.g. mystr = r'\bstack overflow\b'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.