0

in php i want to explode string with tag using utf-8 between them, for example, in this text:

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you";

in that i have to <heading></heading> tag with utf8 between them, i want to have simple array with them like with:

$arr[0] = "<heading>فهرست اول</heading>hi my name is mahdi  whats app";
$arr[1] = "<heading>فهرست دوم</heading>how are you";

strings between <heading></heading> are different, how can i make this array? question is how can i explode text by <heading>ENY TEXT</heading>

6
  • Heave you tried using regex? preg_split with /(?=<heading>.*?<\/heading>)/ as pattern should work... Commented Sep 18, 2017 at 14:56
  • It should answer your question : stackoverflow.com/questions/5696412/… Commented Sep 18, 2017 at 14:56
  • @Soaku no, could you help me how can i use this reqular? Commented Sep 18, 2017 at 14:57
  • @Mahdi.Pishguy $arr = preg_split('/(?=<heading>.*?<\/heading>)/', $content) will split the string on the <heading> tag, no matter of its contents without removing it. This should work... Commented Sep 18, 2017 at 15:00
  • @Soaku yes, that work fine, but i want to have between tag with parent, i dont like to remove heading Commented Sep 18, 2017 at 15:07

4 Answers 4

2

You can use preg_split to split the text by a regular expression, then array_filter to remove empty strings:

$arr = array_filter(preg_split('/(?=<heading>.*?<\/heading>)/', $contents), 'strlen');

It won't remove the tag, since it is in a look-ahead - a group construct that doesn't consume what it matched.

For example:

<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you

This should return:

array(
  [0] => "<heading>فهرست اول</heading>hi my name is mahdi  whats app ",
  [1] => "<heading>فهرست دوم</heading>how are you"
)

You can check this regex online: https://regex101.com/r/ITi7Lh/1
Or, if you prefer, see how PHP parses it: (the link doesn't seem to work on SO, you have to manually paste it): https://en.functions-online.com/preg_split.html?command={"pattern":"\/(?=<heading>.*?<\\\/heading>)\/","subject":"<heading>\u0641\u0647\u0631\u0633\u062a \u0627\u0648\u0644<\/heading>hi my name is mahdi whats app <heading>\u0641\u0647\u0631\u0633\u062a \u062f\u0648\u0645<\/heading>how are you","limit":-1}

Sign up to request clarification or add additional context in comments.

11 Comments

i'm sorry sir, i have this result with you code: Array ( [1] => فهرست اولhi my name is mahdi whats app [2] => فهرست دومhow are you )
I'm so sorry sir, you have right, after see source page i know result is correct
@Mahdi.Pishguy I was about to say this, but IDK why I didn't post that. Good to know that it helped. :)
@Mahdi.Pishguy if you don't use any other tag, strip_tags will remove html notation without removing contents. Otherwise, you could use some regex like... `/<\/?heading.*?>/
@Mahdi.Pishguy I'm happy to help :) Just wanted to note, that even when regex might be simple, but they aren't always the best solution. If you use complex regexes or just too often, you might notice a slow down. If an alternative exists - use it. Other people here already offered some, so they might be better for you.
|
2

You can use strpos and Substr to do the same if your UTF is causing issues.

This will loop till it can't find anymore heading and then add the last Substr after the loop.

https://3v4l.org/UPfbb

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you<heading>فهرست اول</heading>hi my name is mahdi  whats app2 <heading>فهرست دوم</heading>how are you2";

$oldpos =0;
$pos =strpos($content, "<heading>",1); // offset 1 to exclude first heading.

While($pos !== false){
    $arr[] = Substr($content, $oldpos, $pos-$oldpos);
    $oldpos = $pos;
    $pos =strpos($content, "<heading>",$oldpos+1); //offset previous position + 1 to make sure it does not catch the same again 
}
$arr[] = Substr($content, $oldpos); // add last one since it does not have a heading tag after itself.
Var_dump($arr);

Comments

1

You can use preg_match, or in your case, preg_match_all:

$content = "<heading>فهرست اول</heading>hi my name is mahdi  whats app <heading>فهرست دوم</heading>how are you";

preg_match_all("'<heading>.*?<\/heading>'si", $content, $matches);
print_r($matches[0]);

gives:

Array
(
    [0] => <heading>فهرست اول</heading>
    [1] => <heading>فهرست دوم</heading>
)

1 Comment

He expects to get also what's behind the tag until the next one. Not only what's inside it
1

You can try the following function, it should meet your needs well. Basically you should split the array using <heading> as the delimiter, and each item in the resultant array will be what you require, but the heading tag will be stripped since it is what you did your split on, so you need to add it back. There are comments explaining what the code is doing.

function get_what_mahdi_wants($in_string){

  $mahdis_strings_array = array();

  // Split string at occurrences of '<heading>'
  $mahdis_strings = explode('<heading>', $in_string);
  foreach($mahdis_strings as $mahdis_string){

    // if '<heading>' is found at start of string, empty array element will be created. Skip it.
    if($mahdis_string == ''){ continue; }

    // Add back string element with '<heading>' tag prepended since exploding on it stripped it.
    $mahdis_strings_array[] = '<heading>'.$mahdis_string;
  }
  return $mahdis_strings_array;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.