4

I need to split a string by seperators that are known to me and also unknown. For example i know i want to split the string by "\n" and "," and "." but also 1 sperator that can be user defined: for example it can be ";" or "hello" or pretty much anything.

I tried this:

"[\n|,|.|".$exp."]"

...but that didnt work as expected. As i understand | means or. So this reg exp should say that split by "\n" or "," or "." or "hello". I think its because if i try just [hello] then it splits by every letter, not the whole word. Thats strange because if i try just [\n] then it only splits by "\n" - not by "\" or "n".

Can someone please explain this to me? :)

1
  • the brackets is used to define a list of character, [ab] is a or b, it is the same as a|b Commented May 18, 2013 at 23:42

6 Answers 6

6

When you place a bunch of characters in a character class, as in [hello], this defines a token that matches one character that is either h, e, l or o. Also, | has no meaning inside of a character class - it's just matched as a normal character.

The correct solution isn't to use a character class - you meant to use normal brackets:

(\n|,|\.|".$exp.")

By the way - make sure that you escape any regex metacharacters that are in $exp. Basically, the full list here needs to be escaped with backslashes: http://regular-expressions.info/reference.html There may be a helper function to do it for you.

EDIT: Since you're not using a character class, we now need to escape \ the . which is now a metacharacter meaning 'match one of anything'. Almost forgot.

Sign up to request clarification or add additional context in comments.

1 Comment

Got it, Thank you and everyone else who anwsered :)
1

\n is actually only one character, a new line, (the \ before the n indicates an escape sequence) so that's why it works and hello doesn't.

Also, keep in mind that allowing arbitrary input into a regular expression can be a security risk, depending on what your regular expression is being used for, so be very careful and make sure you sanitize your input to that regular expression.

Comments

1

Try using this regex:

preg_split('#[\n,.]|'.$exp.'#', ...);

Note the single quots, to avoid \n getting replaced by the new line.

Comments

1

Drop the [ and ] as these define a character class. \n counts as a single character in a double-quoted string. Just using the string without the character class should work as you need:

preg_split("/\n|,|.|$exp/", $input)

Comments

1

Use preg_split()

For example:

Input:

$exp = '#';
preg_split("/[,.\n$exp]/", "0\n1,2.3#4")

Output:

Array ( [0] => 0 [1] => 1 [2] => 2 [3] => 3 [4] => 4)

Comments

1

here is a simple solution:

"(\n|,|\.|".$exp.")"

or you can do it like:

"([\n,.]|".$exp.")"

2 Comments

You have it backwards, you need to escape . when it's outside of a character class and not when it is.
didn't notice that.. fixed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.