6

I have UTF8 byte[] of infinite size (i.e. of very large size). I want to truncate it to 1024 bytes only and then convert it to string.

Encoding.UTF8.GetString(byte[], int, int) does that for me. It first shortens 1024 bytes and then gives me its converted string.

But in this conversion, if last character is of UTF8 character set, which is made of 2 bytes and whose first byte falls in range and another byte is out of range then it displays ? for that character in converted string.

Is there any way so that this ? does not come in converted string?

1 Answer 1

6

That's what the Decoder class is for. It allows you to stream byte data into char data, while maintaining enough state to handle partial code-points correctly:

Encoding.UTF8.GetDecoder().GetChars(buffer, 0, 1024, charBuffer, 0)

Of course, when the code-point is split in the middle, the Decoder is left with a "partial char" in its state, but that doesn't concern you in your case (and is desirable in all the other use cases :)).

Sign up to request clarification or add additional context in comments.

2 Comments

I don't known how to deal with pointers. Any help or alternative of your solution?
@pratik03 No pointers involved - just use the char[] (and byte[]) overload instead of the char* (and byte*) overload.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.