3

For dummies, in PHP what is the difference between single-byte strings and multi-byte strings and in which situations should we consider one or another?

For single-byte strings (e.g. US-ASCII, ISO 8859 family, etc.) use substr and for multi-byte strings (e.g. UTF-8, UTF-16, etc.) use mb_substr:

// singlebyte strings
$result = substr($myStr, 0, 5);
// multibyte strings
$result = mb_substr($myStr, 0, 5);

For instance, if I plan to develop something to be used in china, do I need to adopt any special measures because of their special characters ? Isnt' Utf-8 encoding good enough?

3
  • PHP doesnt understand UTF-8, you need to tell it that your string is UTF-8 (as chinese characters are), then use 'mb_x' functions to work on them. Notice that 'mb_x' funcs are working on x-bytes per 1 character, while non 'mb_x' are working on 1-byte per character. Commented May 23, 2014 at 5:54
  • damn! thanks but thats a complex explanation man Commented May 23, 2014 at 6:06
  • You can always use mb_ functions, regardless the characterset, and be on the safe site Commented May 23, 2014 at 6:17

1 Answer 1

3

The function strlen (Single bytes) returned full count bytes, and function mb_strlen returned count characters!

The char can be have a more then 1 byte (UTF-8 for example).

For you example:

$myStr = '៘៥឴ឨឆ';
$result = substr($myStr, 0, 5);
$result = mb_substr($myStr, 0, 5, mb_detect_encoding($myStr));

Function substr in this example return invalid value, because chars have more the one byte, but function mb_substr returned correct data.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.