Your string is probably UTF-8, where "characters" and "bytes" are not the same thing. The std::string class assumes "characters" are one byte each, so the results are wrong.
Your options are to convert the string to UTF-16 and use a wstring instead, where you can (generally) assume that characters are all two bytes (a wchar_t or short) each, or you can use a library like ICU or UTF8-CPP to operate on UTF-8 strings directly, doing things like "get the 3rd character" rather than "get the 3rd byte".
Or, if you want to go minimalist, you could just code up a (relatively) simple function to get the byte offset and length of a particular character by reusing the internals of one of the UTF-8 string-length functions from one of the libraries listed above or from google. Basically you have to inspect each character and jump ahead 1-3 bytes to get to the start of the next character depending on what bits are set.
Here's one that could be easily translated from PHP:
for($i = 0; $i < strlen($str); $i++) {
$value = ord($str[$i]);
if($value > 127) {
if($value >= 192 && $value <= 223)
$i++;
elseif($value >= 224 && $value <= 239)
$i = $i + 2;
elseif($value >= 240 && $value <= 247)
$i = $i + 3;
else
die('Not a UTF-8 compatible string');
}
$count++;
}
http://www.php.net/manual/en/function.strlen.php#25715