Regex: Specify "space or start of string" and "space or end of string"

Question

Imagine you are trying to pattern match "stackoverflow".

You want the following:

 this is stackoverflow and it rocks [MATCH]

 stackoverflow is the best [MATCH]

 i love stackoverflow [MATCH]

 typostackoverflow rules [NO MATCH]

 i love stackoverflowtypo [NO MATCH]

I know how to parse out stackoverflow if it has spaces on both sites using:

/\s(stackoverflow)\s/

Same with if its at the start or end of a string:

/^(stackoverflow)\s/

/\s(stackoverflow)$/

But how do you specify "space or end of string" and "space or start of string" using a regular expression?

Chuck Le Butt · Accepted Answer · 2019-04-17 23:09:09Z

263

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s)  #the | means or. () is a capturing group. 


/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow)   #the string you want. () optional
(?=\s|$)          #to look ahead.

edited Apr 17, 2019 at 23:09

Chuck Le Butt

49k62 gold badges213 silver badges299 bronze badges

answered Jul 15, 2011 at 21:32

Jacob Eggers

9,3282 gold badges27 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Alan Moore Over a year ago

\b is a zero-width assertion; it never consumes any characters. There's no need to wrap it in a lookaround.

Mahn Over a year ago

Note that in most regexp implementations, \b is standard ASCII only, that is to say, no unicode support. If you need to match unicode words you have no choice but to use this instead: stackoverflow.com/a/6713327/1329367

sam2426679 Over a year ago

The easier way to exclude the group selection from the match is (?:^|\s)

sam2426679 Over a year ago

for python, replace (?<=\s|^) with (?:(?<=\s)|(?<=^)). Otherwise, you get error: look-behind requires fixed-width pattern

Mikhail T. Over a year ago

The \b would consider other characters -- such as "." as word-breakers, whereas the asker specifically said "space". @gordy's solution seems better.

|

gordy · Accepted Answer · 2011-07-15 21:28:41Z

103

(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:

(^|\s)stackoverflow($|\s)

answered Jul 15, 2011 at 21:28

gordy

9,9713 gold badges37 silver badges49 bronze badges

3 Comments

Mahn Over a year ago

If you use this pattern to replace, remember to keep the spaces in the replaced result by replacing with the pattern $1string$2.

felwithe Over a year ago

This is the only one that works for me too. Word boundaries never seem to do what I want. For one, they match some characters besides whitespace (like dashes). This solved it for me because I'd been trying to put $ and ^ into a character class, but this shows they can just be put into a regular pattern group.

Vlax Over a year ago

This works quite nicely but if you are not interested in capturing the spaces use this: (?:^|\s)stackoverflow(?:$|\s)

Alan Moore · Accepted Answer · 2011-07-15 21:44:32Z

29

Here's what I would use:

 (?<!\S)stackoverflow(?!\S)

In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.

This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.

edited Jul 15, 2011 at 21:44

answered Jul 15, 2011 at 21:38

Alan Moore

75.6k13 gold badges109 silver badges161 bronze badges

2 Comments

anonymous-one Over a year ago

good explanation on why to use this. i would have picked this however the string being tested is ALWAYS a single line.

Alan Moore Over a year ago

@LawrenceDol, did you mean (?<=\S)...(?=\S)? Note that the uppercase \S matches any character that's NOT whitespace. So the negative lookarounds will match if there IS a whitespace character there, or if there's no character at all.

Andrew Clark · Accepted Answer · 2011-07-15 21:32:03Z

12

\b matches at word boundaries (without actually matching any characters), so the following should do what you want:

\bstackoverflow\b

answered Jul 15, 2011 at 21:32

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

1 Comment

Asclepius Over a year ago

For Python it helps to specify it a raw string, e.g. mystr = r'\bstack overflow\b'

Collectives™ on Stack Overflow

Regex: Specify "space or start of string" and "space or end of string"

4 Answers 4

10 Comments

3 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

3 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related