0

I'm trying to figure out a very basic issue: I have a byte array with 8 bytes, containing data. I would like to convert it to string and then, on the "other side", to encode it back to the exact byte array. For some reason - this is not working. The value I'm using is 22.22 (double) and it represented as 8 bytes with the values: 184, 30,133,235,81,56,54,64 I would expect the byteArrayList single item (for complicity) which, in my case contains 8 bytes to be identical to the result item - they are not identical - the result contsins 14 bytes !!!! This is the result: 239, 191, 189, 30, 239, 191, 189, 239, 191, 189, 81, 56, 54, 64

It appears as if both the byteArrayList single item and the result contians some escape characters (\u001b in hexadecimal) What am I missing here ?
Here is my code:

List<byte[]> byteArrayList = new List<byte[]>();
...Here I populate the byteArrayList with 1 item - 8 bytes long
UTFEncoding encoding = new UTF8Encoding();
StringBuilder SB = new StringBuilder();
foreach(byte[] byteArrayItem in byteArrayList)//currently - for simplicity contains one item
{
    SB.Append(encoding.GetString(byteArrayItem ));
}
string items = SB.ToString();
byte[] result = Encoding.UTF8.GetBytes(items); //I would expect the result to be identical to the byteArrayList  - it is not ! it contains 12 bytes !!!!
1
  • 2
    For the record: you don't start with a "utf-8 byte array", you start with a byte array with some arbitrary data that's not (necessarily) a UTF-8 encoded string. That's a very important distinction. Commented Jan 23, 2023 at 13:33

1 Answer 1

3

Not every sequence of bytes is valid UTF-8, so you can't use UTF-8 this way. You need to use an encoding that can handle an arbitrary sequence of bytes, such as Base64 or Hex.

The sequence 239, 181, 189 is the UTF-8 encoding of REPLACEMENT CHARACTER. This is the character used to indicate a decoding error.

Sign up to request clarification or add additional context in comments.

3 Comments

I replaced the utf-8 encoder with an ascii - but still - a question mark (?) is being inserted when executing the encoding.GetString method
ASCII also cannot encode all byte sequences (184 is not a valid byte in ASCII, for example). You will need an encoding that can encode arbitrary sequences of bytes, such as Base64 or Hex. Alternately you could string-encode your double as "22.22" (0x32322e3232).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.