Can I safely use explode() on a multi-byte string, specifically UTF8? Or do I need to use mb_split()?
If mb_split(), then why?
Can I safely use explode() on a multi-byte string, specifically UTF8? Or do I need to use mb_split()?
If mb_split(), then why?
A multi-byte string is still just a string, and explode would happily split it on whatever delimiter you provide. My guess is that they will probably behave identically under most circumstances. If you are concerned about a particular situation, consider using this test script:
<?php
$test = array(
"ὕβρις",
"путин бандит",
"Дерипаска бандит",
"Трамп наша сука"
);
$delimiter = "д";
foreach($test as $t) {
$explode = explode($delimiter, $t);
echo "explode: " . implode("\t", $explode) . "\n";
$split = mb_split($delimiter, $t);
echo "split : " . implode("\t", $split) . "\n\n";
if ($explode != $split) {
throw new Exception($t . " splits differently!");
}
}
echo "script complete\n";
It's worth pointing out that both explode() and mb_split() have the exact same parameter list -- without any reference to language or character encoding. You should also realize that how your strings are defined in PHP depend on where and how you obtain your delimiter and the string to be exploded/split. Your strings might come from a text or csv file, a form submission in a browser, an API call via javascript, or you may define those strings right in your PHP script as I have here.
I might be wrong, but I believe that both functions will work by looking for instances of the delimiter in the string to be exploded and will split them.
explode() will return false if you set the delimiter to the empty string, and the delimiter argument is mandatory. PHP's preg_split() can split a string without a delimiter, though.