0

I am trying to convert a string to bytes and vice versa..i have seen the previous question of converting string to byte array on this site..but my problem is something else

Here is my code

byte[] btest = new byte[2];
btest[0] = 0xFF;
btest[1] = 0xAA;
UTF8Encoding enc = new UTF8Encoding();
string str = enc.GetString(btest); //here i get a string with values str = '��'

//I had a byte array of size 2 with the above contents
//Here i am trying to convert the string to byte array
byte [] bst = enc.GetBytes(str); //On this step i get a byte array of size 6 
//and bst array contents as {239,191,189,239,191,189}

//In this step i try to convert the value back to btest array by taking the index
btest[0] = Convert.ToByte(str[0]); //on this line i get an exception
//Exception : Value was either too large or too small for an unsigned byte.
btest[1] = Convert.ToByte(str[1]);

Shouldn't the GetBytes return me a byte array of size 2,what wrong am i doing?? I want bst[0] to contain the same value which i have assigned to btest[0] .

Thanks

1
  • Please try to be more clear in your Question titles. :) Commented Dec 24, 2013 at 10:51

2 Answers 2

1

Your original byte input is not valid UTF-8 (see here), in that it doesn't represent any unicode code point. As a result the invalid data is converted to �. In the end, that is a character like any other, so if you try to convert that back to bytes, it won't generate your initial wrong byte sequence, but the proper byte sequence to represent that unicode code point (twice).

The character cannot be represented as a single byte, hence Convert.ToByte throws an OverflowException.

If you were to change your original input to a valid byte sequence, say:

btest[0] = 0xDF;
btest[1] = 0xBF;

You will see that the enc.GetBytes(str) call actually results in a two-byte array again.

Sign up to request clarification or add additional context in comments.

Comments

0

Character with codepoint 0xFF 0xAA is invalid in UTF-8 encoding, thus it's converted to

References:

8 Comments

but when i convert it back why dont i get the correct output??
@singh: it is correct. The str equals to ��, each of which is represented by 3 bytes codepoint
actually i am passing the string constructed from here to c++ but i dont get the correct value there as explained by David Heffernan in his answer stackoverflow.com/questions/20145911/c-sharp-to-c-array/…
on the other hand if i use kunal's answer in the same question i get the correct value
@singh: you got 2 answers that explain the behaviour. If you perform 1 + 1 the returned value would be 2, not 5 even if you really want it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.