1

I have this kind of string :

Blabla1 Blaabla2<br />  Blaabla3 Blaabla4

I'm trying to explode each word where there is a " " or "<br />" with preg_split .

What I exepect :

Blabla1
Blabla2 <br />
Blaabla3
Blaabla4

I tried with this regex (?:(<br\s))|\s but don't manage to exlude "/>"

http://regexr.com/3aqs0

Thanks !

6
  • 2
    Are you wanting to retain the <br />? Your expected output shows it retained ... Commented Apr 15, 2015 at 17:17
  • Does the text have other HTML tags within it? Commented Apr 15, 2015 at 17:22
  • @hwnd yes I want it :) Commented Apr 15, 2015 at 17:22
  • @JasonMcCreary Yes like <strong> <em> ... Commented Apr 15, 2015 at 17:23
  • It would probably be clearer if you converted all the <br/> to spaces, and then did the split on spaces. You don't need to do everything in one single regex. Commented Apr 15, 2015 at 17:45

2 Answers 2

3

One way you could do this:

$str = 'Blabla1 Blaabla2<br />  Blaabla3 Blaabla4';
$results = preg_split('~(?:<br[^>]*>\s*\K|\s+)~', $str);
print_r($results);

Output

Array
(
    [0] => Blabla1
    [1] => Blaabla2<br />  
    [2] => Blaabla3
    [3] => Blaabla4
)
Sign up to request clarification or add additional context in comments.

5 Comments

it works nice thx !! but can you explain all please for my information :) . It interest me !
Are you just wanting to split on whitespace not inside HTML? And note this will split the <br> if there is a space preceding it as well. I am not clear what exactly you're trying to achieve here.
In fact, I'm using a text EDITOR (CKEDITOR). When I save the textarea in DataBase, I need to keep all tags (<br>, <strong>, <em>, <u>). Then when I reload the page with my text Editor I need to split each word with his associated tags because my text editor is coupled with a audio player (Jwplayer) which underline each word in function of time avanced...
Other question, If in my string I have two or more consecutive <br /> ? like $str = 'Blabla1 Blaabla2<br /><br /> Blaabla3 Blaabla4'; How to do ?
You could do (?:(?:<br[^>]*>\s*)+\K|\s+)
1

If there is not more HTML, it's okay to use RegEx. Otherwise there are many better ways.

Use <br(\s\/)?>\K|\s:

$matches = preg_split('/<br(\s\/)?>\K|\s/',$string);

This will also work for <br> (which is correct HTML too)

Consider the flag PREG_SPLIT_NO_EMPTY, because there are going to be empty elements using your example string:

preg_split('/<br(\s\/)?>\K|\s/',$string,null,PREG_SPLIT_NO_EMPTY);

Update: To keep the <br />, you need to reset the match using \K. There is a good example on this in the language reference:

\K can be used to reset the match start since PHP 5.2.4. For example, the pattern foo\Kbar matches "foobar", but reports that it has matched "bar". The use of \K does not interfere with the setting of captured substrings. For example, when the pattern (foo)\Kbar matches "foobar", the first substring is still set to "foo".

5 Comments

Also he can use PREG_SPLIT_NO_EMPTY for avoiding empty elements, like preg_split('/(?:(<br\s\/>))|\s/',$string,null,PREG_SPLIT_NO_EMPTY);
Correct, forgot the flags :) I will add this
almost perfect :) ... How to keep the <br/> after blabla2 ?
@Marc Works Nice ! :) And now, If in my string I have two or more consecutive <br /> ? like $str = 'Blabla1 Blaabla2<br /><br /> Blaabla3 Blaabla4'; How to do ?
find solution here : stackoverflow.com/questions/29671967/… . Thanks for all !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.