UTF8 byte[] to string conversion

Question

I have UTF8 byte[] of infinite size (i.e. of very large size). I want to truncate it to 1024 bytes only and then convert it to string.

Encoding.UTF8.GetString(byte[], int, int) does that for me. It first shortens 1024 bytes and then gives me its converted string.

But in this conversion, if last character is of UTF8 character set, which is made of 2 bytes and whose first byte falls in range and another byte is out of range then it displays ? for that character in converted string.

Is there any way so that this ? does not come in converted string?

Luaan · Accepted Answer · 2016-04-20 09:20:21Z

6

That's what the Decoder class is for. It allows you to stream byte data into char data, while maintaining enough state to handle partial code-points correctly:

Encoding.UTF8.GetDecoder().GetChars(buffer, 0, 1024, charBuffer, 0)

Of course, when the code-point is split in the middle, the Decoder is left with a "partial char" in its state, but that doesn't concern you in your case (and is desirable in all the other use cases :)).

answered Apr 20, 2016 at 9:20

Luaan

64.1k7 gold badges107 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pratik03 Over a year ago

I don't known how to deal with pointers. Any help or alternative of your solution?

Luaan Over a year ago

@pratik03 No pointers involved - just use the char[] (and byte[]) overload instead of the char* (and byte*) overload.

Collectives™ on Stack Overflow

UTF8 byte[] to string conversion

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related