0

Can I safely use explode() on a multi-byte string, specifically UTF8? Or do I need to use mb_split()?

If mb_split(), then why?

1
  • So long as the string you're exploding on is a valid UTF-8 sequence. Either one. Usually. So long as the delimiter isn't followed by combining marks, eg: diacritics, in which case, neither. Commented Jan 22, 2019 at 22:20

1 Answer 1

2

A multi-byte string is still just a string, and explode would happily split it on whatever delimiter you provide. My guess is that they will probably behave identically under most circumstances. If you are concerned about a particular situation, consider using this test script:

<?php

$test = array(
        "ὕβρις",
        "путин бандит",
        "Дерипаска бандит",
        "Трамп наша сука"
);
$delimiter = "д";


foreach($test as $t) {
        $explode = explode($delimiter, $t);
        echo "explode: " . implode("\t", $explode) . "\n";

        $split = mb_split($delimiter, $t);
        echo "split  : " . implode("\t", $split) . "\n\n";

        if ($explode != $split) {
                throw new Exception($t . " splits differently!");
        }
}


echo "script complete\n";

It's worth pointing out that both explode() and mb_split() have the exact same parameter list -- without any reference to language or character encoding. You should also realize that how your strings are defined in PHP depend on where and how you obtain your delimiter and the string to be exploded/split. Your strings might come from a text or csv file, a form submission in a browser, an API call via javascript, or you may define those strings right in your PHP script as I have here.

I might be wrong, but I believe that both functions will work by looking for instances of the delimiter in the string to be exploded and will split them.

Sign up to request clarification or add additional context in comments.

3 Comments

What if you want to turn a string of characters into an array that has no delimiter? PHP's explode() will return false if you set the delimiter to the empty string, and the delimiter argument is mandatory. PHP's preg_split() can split a string without a delimiter, though.
mb_str_split() and str_split()
Could you maybe select more polite example strings?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.