347

How can I match a space character in a PHP regular expression?

I mean like "gavin schulz", the space in between the two words. I am using a regular expression to make sure that I only allow letters, number and a space. But I'm not sure how to find the space. This is what I have right now:

$newtag = preg_replace("/[^a-zA-Z0-9s|]/", "", $tag);
2
  • 3
    Hmm... there is also no question about matching an 'a' or a 'b'... ;) Commented Feb 18, 2009 at 2:16
  • 1
    you should see the regex examples Commented Apr 23, 2013 at 10:36

9 Answers 9

492

If you're looking for a space, that would be " " (one space).

If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).

If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).

These will work in every* regex engine I've ever seen (some of which don't even have the one-or-more "+" character, ugh).

If you know you'll be using one of the more modern regex engines, "\s" and its variations are the way to go. In addition, I believe word boundaries match start and end of lines as well, important when you're looking for words that may appear without preceding or following spaces.

For PHP specifically, this page may help.

From your edit, it appears you want to remove all non valid characters The start of this is (note the space inside the regex):

$newtag = preg_replace ("/[^a-zA-Z0-9 ]/", "", $tag);
#                                    ^ space here

If you also want trickery to ensure there's only one space between each word and none at the start or end, that's a little more complicated (and probably another question) but the basic idea would be:

$newtag = preg_replace ("/ +/", " ", $tag); # convert all multispaces to space
$newtag = preg_replace ("/^ /", "", $tag);  # remove space from start
$newtag = preg_replace ("/ $/", "", $tag);  # and end
Sign up to request clarification or add additional context in comments.

5 Comments

His original regex seemed to want to replace the " " character. You are negating the space, therefore his space won't be "deleted" as intended.
Quoting: "only allow letters, number and a space", Gavin's original RE was wrong (which is why he was asking the question). My RE deletes everything that isn't one of those.
Why does the space have to be at the end of the match pattern instead of, say, in the middle?
@warren, it doesn't. The 'space here' comment wasn't stating where the space went, rather it was stating that there was a space there (in case the reader didn't realise).
@Mike, no, that's not the case. The intent here is to replace all characters that are not in the set A-Za-z.... The caret inside the square brackets dictates that. Moving the caret outside the square brackets changes its meaning to matching characters in the set at the start of the string.
102

Cheat Sheet

Here is a small cheat sheet of everything you need to know about whitespace in regular expressions:

[[:blank:]]

Space or tab only, not newline characters. It is the same as writing [ \t].

[[:space:]] & \s

[[:space:]] and \s are the same. They will both match any whitespace character spaces, newlines, tabs, etc...

\v

Matches vertical Unicode whitespace.

\h

Matches horizontal whitespace, including Unicode characters. It will also match spaces, tabs, non-breaking/mathematical/ideographic spaces.

x (eXtended flag)

Ignore all whitespace. Keep in mind that this is a flag, so you will add it to the end of the regex like /hello/gmx. This flag will ignore whitespace in your regular expression.

For example, if you write an expression like /hello world/x, it will match helloworld, but not hello world. The extended flag also allows comments in your regex.

Example

/helloworld #hello this is a comment/

If you need to use a space, you can use \ to match spaces.

2 Comments

Not quite "everything": you also need to know that \s is a character class, thus may or may not need wrapping in [] or () depending on language/dialect.
What is the difference between \s and [ ] (i.e, a space in square brackets or space inside a character set). Are they both interchangable? Can I use either of the both to detect space between two words?
60

To match exactly the space character, you can use the octal value \040 (Unicode characters displayed as octal) or the hexadecimal value \x20 (Unicode characters displayed as hex).

Here is the regex syntax reference: https://www.regular-expressions.info/nonprint.html.

Comments

13

In Perl the switch is \s (whitespace).

3 Comments

This is incorrect - it gathers all whitespace, not just the space character.
But the question is tagged with PHP, not Perl.
@PeterMortensen Perl and PHP use the same regex engine PCRE so this will work in PHP.
5

I am using a regex to make sure that I only allow letters, number and a space

Then it is as simple as adding a space to what you've already got:

$newtag = preg_replace("/[^a-zA-Z0-9 ]/", "", $tag);

(note, I removed the s| which seemed unintentional? Certainly the s was redundant; you can restore the | if you need it)

If you specifically want *a* space, as in only a single one, you will need a more complex expression than this, and might want to consider a separate non-regex piece of logic.

Comments

4

It seems to me like using a REGEX in this case would just be overkill. Why not just just strpos to find the space character. Also, there's nothing special about the space character in regular expressions, you should be able to search for it the same as you would search for any other character. That is, unless you disabled pattern whitespace, which would hardly be necessary in this case.

Comments

3

Use it like this to allow for a single space.

$newtag = preg_replace("/[^a-zA-Z0-9\s]/", "", $tag)

Comments

2

You can also use the \b for a word boundary. For the name I would use something like this:

[^\b]+\b[^\b]+(\b|$)

EDIT Modifying this to be a regex in Perl example

if( $fullname =~ /([^\b]+)\b[^\b]+([^\b]+)(\b|$)/ ) {
 $first_name = $1;
 $last_name = $2;
}

EDIT AGAIN Based on what you want:

$new_tag = preg_replace("/[\s\t]/","",$tag);

2 Comments

the word boundary matcher \b also matches hyphens
An escaped b inside a character class represents a backspace character, not a word-boundary.
1

I'm trying out [[:space:]] in an instance where it looks like bloggers in WordPress are using non-standard space characters. It looks like it will work.

2 Comments

What do you mean by "bloggers in WordPress"? Can you elaborate?
@PeterMortensen This was back when I developed and supported a bunch of WordPress blogs for a major publisher. The writers were writing posts with some unexpected space characters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.