Let's say I have an array of bytes:
var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87 };
So it has 6 elements while it corresponds to utf8 abąć which has 4 letters. Typically you do
Encoding.UTF8.GetString(myArr);
to convert it to a string. But lets assume that myArr is actually bigger (there are more bytes at the end) but I do know (a priori to conversion) that I only want the first 4 letters. How can efficiently convert this array to the string? Also it would be preferable to have the index of the last byte in myArr array (corresponding to the end of the converted string).
Example:
// 3 more bytes at the end of formerly defined myArr
var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var str = MyConvert(myArr, 4); // read 4 utf8 letters
// str is "abąć"
// possibly I want to know that MyConvert stoped at the index 6 in myArr
The resulting string str object should have str.Length == 4.
Encoding.UTF8.GetString(myArr), regarding code length it doesn't get any more efficient than that. What's your question? What do you mean by the last sentence?charvalues (UTF-16 code units) or up to 4 Unicode code points? Suppose the byte array is entirely made up of surrogate pairs - do you want 8 chars or 4 in that case?