PHP explode string with tags using UTF8 between them

Question

in php i want to explode string with tag using utf-8 between them, for example, in this text:

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you";

in that i have to <heading></heading> tag with utf8 between them, i want to have simple array with them like with:

$arr[0] = "<heading>فهرست اول</heading>hi my name is mahdi  whats app";
$arr[1] = "<heading>فهرست دوم</heading>how are you";

strings between <heading></heading> are different, how can i make this array? question is how can i explode text by <heading>ENY TEXT</heading>

Heave you tried using regex? preg_split with /(?=<heading>.*?<\/heading>)/ as pattern should work... — user5147563
– user5147563, Commented Sep 18, 2017 at 14:56
It should answer your question : stackoverflow.com/questions/5696412/… — Pauloscorps
– Pauloscorps, Commented Sep 18, 2017 at 14:56
@Mahdi.Pishguy $arr = preg_split('/(?=<heading>.*?<\/heading>)/', $content) will split the string on the <heading> tag, no matter of its contents without removing it. This should work... — user5147563
– user5147563, Commented Sep 18, 2017 at 15:00
@Soaku yes, that work fine, but i want to have between tag with parent, i dont like to remove heading — DolDurma
– DolDurma, Commented Sep 18, 2017 at 15:07

score 2 · Accepted Answer · 2017-09-18 15:28:08Z

2

You can use preg_split to split the text by a regular expression, then array_filter to remove empty strings:

$arr = array_filter(preg_split('/(?=<heading>.*?<\/heading>)/', $contents), 'strlen');

It won't remove the tag, since it is in a look-ahead - a group construct that doesn't consume what it matched.

For example:

<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you

This should return:

array(
  [0] => "<heading>فهرست اول</heading>hi my name is mahdi  whats app ",
  [1] => "<heading>فهرست دوم</heading>how are you"
)

You can check this regex online: https://regex101.com/r/ITi7Lh/1
Or, if you prefer, see how PHP parses it: (the link doesn't seem to work on SO, you have to manually paste it): https://en.functions-online.com/preg_split.html?command={"pattern":"\/(?=<heading>.*?<\\\/heading>)\/","subject":"<heading>\u0641\u0647\u0631\u0633\u062a \u0627\u0648\u0644<\/heading>hi my name is mahdi whats app <heading>\u0641\u0647\u0631\u0633\u062a \u062f\u0648\u0645<\/heading>how are you","limit":-1}

edited Sep 18, 2017 at 15:28

answered Sep 18, 2017 at 15:09

user5147563

Sign up to request clarification or add additional context in comments.

11 Comments

DolDurma Over a year ago

i'm sorry sir, i have this result with you code: Array ( [1] => فهرست اولhi my name is mahdi whats app [2] => فهرست دومhow are you )

DolDurma Over a year ago

I'm so sorry sir, you have right, after see source page i know result is correct

user5147563 Over a year ago

@Mahdi.Pishguy I was about to say this, but IDK why I didn't post that. Good to know that it helped. :)

user5147563 Over a year ago

@Mahdi.Pishguy if you don't use any other tag, strip_tags will remove html notation without removing contents. Otherwise, you could use some regex like... `/<\/?heading.*?>/

user5147563 Over a year ago

@Mahdi.Pishguy I'm happy to help :) Just wanted to note, that even when regex might be simple, but they aren't always the best solution. If you use complex regexes or just too often, you might notice a slow down. If an alternative exists - use it. Other people here already offered some, so they might be better for you.

|

Andreas · Accepted Answer · 2017-09-18 17:01:56Z

You can use strpos and Substr to do the same if your UTF is causing issues.

This will loop till it can't find anymore heading and then add the last Substr after the loop.

https://3v4l.org/UPfbb

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you<heading>فهرست اول</heading>hi my name is mahdi  whats app2 <heading>فهرست دوم</heading>how are you2";

$oldpos =0;
$pos =strpos($content, "<heading>",1); // offset 1 to exclude first heading.

While($pos !== false){
    $arr[] = Substr($content, $oldpos, $pos-$oldpos);
    $oldpos = $pos;
    $pos =strpos($content, "<heading>",$oldpos+1); //offset previous position + 1 to make sure it does not catch the same again 
}
$arr[] = Substr($content, $oldpos); // add last one since it does not have a heading tag after itself.
Var_dump($arr);

Stuart · Accepted Answer · 2017-09-18 15:00:21Z

1

You can use preg_match, or in your case, preg_match_all:

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you";

preg_match_all("'<heading>.*?<\/heading>'si", $content, $matches);
print_r($matches[0]);

gives:

Array
(
    [0] => <heading>فهرست اول</heading>
    [1] => <heading>فهرست دوم</heading>
)

answered Sep 18, 2017 at 15:00

Stuart

6,7692 gold badges29 silver badges42 bronze badges

1 Comment

user5147563 Over a year ago

He expects to get also what's behind the tag until the next one. Not only what's inside it

coderodour · Accepted Answer · 2017-09-18 16:41:53Z

You can try the following function, it should meet your needs well. Basically you should split the array using <heading> as the delimiter, and each item in the resultant array will be what you require, but the heading tag will be stripped since it is what you did your split on, so you need to add it back. There are comments explaining what the code is doing.

function get_what_mahdi_wants($in_string){

  $mahdis_strings_array = array();

  // Split string at occurrences of '<heading>'
  $mahdis_strings = explode('<heading>', $in_string);
  foreach($mahdis_strings as $mahdis_string){

    // if '<heading>' is found at start of string, empty array element will be created. Skip it.
    if($mahdis_string == ''){ continue; }

    // Add back string element with '<heading>' tag prepended since exploding on it stripped it.
    $mahdis_strings_array[] = '<heading>'.$mahdis_string;
  }
  return $mahdis_strings_array;
}

Collectives™ on Stack Overflow

PHP explode string with tags using UTF8 between them

4 Answers 4

11 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related